Code dependency visualisation in Python using Sourcetrail

TL;DR Understanding how complex the code is by drawing a tree diagram using Sourcetrail to visualize the usage of a given object and on this basis plan its refactor or removal.

I am writing this post because grepping was insufficient. Grep or other tools (in my case PyCharm's "Find usages") can show me one level above, then I need to find usages of that usage. If code branches off and each of them has several usages, and those usages branch off again, I am lost. At this point I can open a notepad and start writing down the usages, but it is time-consuming. The effect is unreadable, because I am trying to write down a tree structure, in the worst case when the usages cross each other - impossible without using references or duplication.

The examples in this post are based on planning a large functionality removal. The functionality was already switched to another database. It was so large that code migration was done in several stages, and some components were completely abandoned. This was the last step - removing unused code.

This is the first example of writing down usage trees using grep and notepad. The list is incomplete because I just gave up at some point.

.
└── settings.ELASTICSEARCH_CROSS_ENGAGEMENT_POOL is used as
   ├── settings.ES_CE_URLS
       ├── command.elasticsearch_create_index
       ├── connections.ES_CE_CONNECTION
           ├── api.handlers.cross_engagement.handlers:backend (unused variable)
           ├── command.load_affinity_ce
          
          
       ├── command.ES_masters_finder
       ├── command elasticserach_update_mapping
       ├── command.load_cross_engagements

├────
├──── influencers.handlers._get_top_affinity_for_mapping_without_previous_month
├────── influencers.handlers._get_top_affinity_for_mapping
├──────── influencers.handlers._get_ranked_influencers_with_affinity_data
├──────── influencers.handlers.get_ranked_influencers
├────────── api.private.v1.influencers.endpoints.add_mappings_to_categories_all
├──────────── influencers^custom_category/mappings/add/all?$
├──────────── influencers^rank/?$
      - reports.helpers.export_helpers.ExportInfluencerHelper.__init__
            
       - influencers.handlers.get_available_filters
         - ?

From that moment I started looking for some code analysis tools and found Sourcetrail.

Sourcetrail is the open-source cross-platform source code explorer that uses static analysis to provide code dependency visualization. It supports 4 languages C, C++, Java and Python.

After starting the application, the first step is indexing the project. A few options that in my opinion are important.

sourcetrail project setup

If the project uses virtualenv, it should be pointed. Otherwise, Sourcetrail will use default python env and won't link the project with third-party libraries. In my case it was a Django project, so I excluded migrations which are auto-generated code.

Next Sourcetrail will show the Shallow Python Indexing checkbox, which I recommend not selecting. Code visualization will be incomplete and not all the options will be shown what Sourcetrail can really do.

In comparison to how fast PyCharm is indexing the project, Sourcetrail indexing is slow and uses a lot of CPU. For my project it takes about an hour.

When the project is indexed, visualizations can be generated. In the last months I have used Sourcetrail several times, and it was always the same way. Opening "Custom Trail", typing the symbol and selecting "All referencing". I like a vertical layout then the diagram looks more like a tree.

sourcetrail custom trail

From now one generating diagrams is very fast. Results are incomparably better than my attempt of using plain text and requires minimal work. Sourcetrail allows exporting diagrams to PNG files, and the image underneath is created by this option.

sourcetrail exported diagram

Finally, it can be seen how many changes are needed to achieve the goal. My approach to this diagram was that I wrote down the names of all the edges (from leaves to root) into a list of checkboxes. From that moment on, I was able to adapt the code step by step, each step being a separate commit. Numbered list of checkboxes allows to divide the work into several people.

Conclusions

  • Code tree diagram comes in when the grep is not enough.
  • Sourcetrail is a great tool for people that do not know the code base or don't remember it.
  • Exported diagrams to PNG helps plan the work and do it in co-operation.

Sourcetrail website https://www.sourcetrail.com