Python Application Analysis

Evaluation: Advance Binary Fingerprinting

Python scanning supports binary packaged archives (.whl/tar.gz) and coordinates in the requirements.txt manifests from the Python Package Index (PyPI). For the best results, we recommend first creating a Python virtual environment and resolving the dependencies using a pip install against the requirements file.  This will bring only the dependencies needed by the project into the build while resolving any included dependency ranges in the requirements file.

  • Example workflow. Check out a video demonstration here.
    1. cd project_folder
    2. mkdir ~/py-envs
    3. python3 -m venv ~/py-envs/project_folder
    4. source ~/py-envs/project_folder/bin/activate
    5. pip install -r requirements.txt

  • requirements.txt:
    • Use pip freeze to create the requirements file. 
    • Lifecycle scanners will only use the manifest named requirements.txt while ignoring other variants.
    • Avoid including variable ranges for dependences as these will be ignored by the Lifecycle scanner.
    • The requirements.txt must use the == operator and version without wildcards. 
    • Additional flags should be added to requirements.txt files to scope to the target os/arch as found in the environment markers of the Python documentation.
  • .whl files: may be matched to multiple environmental Python packages. These will show as duplicates in the Lifecycle scan report.
  • Native scanner: Jake is an open-source scanning tool that scans Python & Conda environments for vulnerable third-party dependencies.
    • Jake may provide better results than other scanners as it can interact with the Python environment directly to see what is actually loaded. This will reduce the noise in the final scan report.
    • Using Jake removes the need for the java runtime for other Lifecycle scanners in the developer or build environments. 
    • Using Jake may produce different results than Lifecycle scanners, complicating enforcement efforts. For consistency, use Jake in both the developer and CI environments. 
    • Installing and running Jake for each build may add a significant amount of time to the build process.  Make sure to test first and set expectations before rolling out to production.

Evaluation: Source code and manifest analysis

The Python coordinate-based matching feature provides the ability to scan and evaluate Python dependencies found in Python manifest files without running the pip install first. Files named requirements.txt (generated using a pip command) and poetry.lock (Poetry lock files) will be analyzed.

Converting from other formats

setup.py

A setup.py file can be used to generate a requirements.txt file by first installing its packages (e.g. via pip install . ), ideally to a virtual environment, and then running:

  • pip freeze > requirements.txt for Python 2 or
  • pip3 freeze > requirements.txt for Python 3

What do we parse from the file?

requirements.txt

Requirements using the "==" operator and version without wildcards will be considered. One requirement could be matched to multiple distributions of the same Python package. Using the sys-platform marker makes the dependency more specific. For example:

altgraph==0.10.2
pywin32 ==1.0 ; sys_platform == 'win32'

poetry.lock NEW IN RELEASE 115

Dependencies with the name and exact version are required and evaluated. For example:

  • Name: six
  • Version: 1.16.0

And from "metadata.files"

  • Extension: whl
  • Qualifier: py2.py3-none-any

The package dependencies with name and exact version are also evaluated:

  • Name: colorama
  • Version: 0.4.4
[[package]]
name = "six"
version = "1.16.0"
description = "Python 2 and 3 compatibility utilities"
category = "main"
optional = false
python-versions = ">=2.7, !=3.0.*, !=3.1.*, !=3.2.*"

[package.dependencies]
colorama = "0.4.4"

[metadata]
lock-version = "1.1"
python-versions = "^3.8"
content-hash = "7ae52da2736b4294be7a184e040cc78412add14e92b816077ede183f9e1c636c"

[metadata.files]
six = [
    {file = "six-1.16.0-py2.py3-none-any.whl", hash = "sha256:8abb2f1d86890a2dfb989f9a77cfcfd3e47c2a354b01111771326f8aa26e0254"},
    {file = "six-1.16.0.tar.gz", hash = "sha256:1e61c37477a1626458e36f7b1d82aa5c9b094fa4802892072e49de9c60c4c926"},
]
colorama = [
    {file = "colorama-0.4.4-py2.py3-none-any.whl", hash = "sha256:9f47eda37229f68eee03b24b9748937c7dc3868f906e8ba69fbcbdd3bc5dc3e2"},
    {file = "colorama-0.4.4.tar.gz", hash = "sha256:5941b2b48a20143d2267e95b1c2a7603ce057ee39fd88e7329b0c292aa16869b"},
]

Steps to analyze using the Sonatype IQ CLI

Create requirements

Run pip freeze

pip freeze > requirements.txt

The requirements.txt encoding is UTF-8.  Special note for Microsoft Windows users, the cmd.exe encoding may need to be changed to UTF-8.  Please refer to Microsoft documentation on how to do this.

Example file content


altgraph==0.10.2
backports-abc==0.5
backports.ssl-match-hostname==3.5.0.1
bdist-mpkg==0.5.0
certifi==2018.1.18
chardet==3.0.4
click==6.7
confire==0.2.0
Django==1.6
django-countries==3.3
django-make-app==0.1.3
docopt==0.6.2
enum34==1.1.6


Add environment markers (optional)

Adding environment markers can simplify the results by filtering out components that are not relevant to your deployment platform. Only the sys_platform environment marker is supported at the moment.   

Add the environment marker next to the component(s) in the requirements.txt.

e.g.  

Django==1.6; sys_platform == 'win32'

Run a scan

Invoke a Sonatype IQ CLI scan of the directory containing requirements.txt.  Instructions on how to do this can be found here Sonatype IQ CLI.

How to get the best results

To produce the best outcome the following is suggested.

  • Create a requirements.txt file by using "pip freeze > requirements.txt".
  • Using the requirements.txt download the binaries by executing "pipenv run pip download -r requirements.txt -d <output_dir>". 
  • Run a scan on the <output_dir>