The idea of the Cheesecake project is to rank Python packages based on various empirical "kwalitee" factors, such as:
- whether the package can be downloaded from PyPI given its name
- whether the package can be downloaded from a full URL
- whether the package can be unpacked
- whether the unpack directory is the same as the package name
- whether the package can be installed into an alternate directory
- existence of certain files such as README, INSTALL, LICENSE, setup.py etc.
- existence of certain directories such as doc, test, demo, examples
- percentage of modules/functions/classes/methods with docstrings
- percentage of functions/methods that are unit tested (not currently implemented)
- average pylint score for all non-test and non-demo modules
Currently, the Cheesecake index is computed for invidual packages obtained through a variety of methods (detailed below). One of the goals of the Cheesecake project is to automatically compute the Cheesecake index for all packages uploaded to the PyPI Cheese Shop (possibly at upload time) and to maintain a collection of Web pages with statistics related to the various indexes of the packages.
Cheesecake currently computes 3 types of indexes:
- installability index
- documentation index
- code kwalitee index
The algorithms for computing each index type are detailed below.
The concept of "kwalitee" originated in the Perl community. Here's a relevant quote:
It looks like quality, it sounds like quality, but it's not quite quality.
Kwalitee is an empiric measure of how good a specific body of code is. It defines quality indicators and measures the code along them. It is currently used by the CPANTS Testing Service to evaluate the 'goodness' of CPAN packages.
Since the Python package repository (aka PyPI) is hosted at the Cheese Shop, it stands to reason that the quality indicator of a PyPI package should be called the Cheesecake index!
To compute the Cheesecake index for a given project, run the cheesecake.py module from the command line and indicate either:
- the package short name (e.g. twill) or
- the package URL (e.g. http://darcs.idyll.org/~t/projects/twill-0.7.4.tar.gz) or
- the package path on the file system (e.g. /tmp/twill-latest.tar.gz)
In all cases, the cheesecake module will attempt to download the package if necessary, then to unpack it in a sandbox directory (/tmp/cheesecake_sandbox by default). If either of these operations fails, the Cheesecake index for the package will be 0. If the package can be successfully unpacked, the cheesecake module will compute the values for a variety of indexes detailed in the algorithm given at the end of this file.
If the package can be successfully downloaded and unpacked, a log file is created in the sandbox directory and named <package>.log (e.g. the log file for twill-0.7.4.tar.gz is /tmp/cheesecake_sandbox/twill-0.7.4.tar.gz.log). The log file is not automatically deleted after the Cheesecake index is computed, since its purpose is to be inspected for debug information.
Command-line examples:
Compute the Cheesecake index for the Durus package by using setuptools utilities to download the package from PyPI:
python cheesecake.py --name=DurusCompute the Cheesecake index for the Durus package by indicating its URL:
python cheesecake.py --url=http://www.mems-exchange.org/software/durus/Durus-3.1.tar.gzCompute the Cheesecake index for the twill package by indicating its path on the local file system:
python cheesecake.py --path=/tmp/twill-latest.tar.gzTo increase the verbosity of the output, use the -v or --verbose option. For more options, run cheesecake.py with -h or --help.
The Cheesecake project has not yet been released as a tarball or a Python egg. You can obtain the source code from SourceForge via CVS:
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/cheesecake co -P cheesecake
Developer mailing list: http://lists.sourceforge.net/lists/listinfo/cheesecake-devel
Cheesecake is licensed under the Python Software Foundation license, the same license that governs Python itself. The text of the license is available in the LICENSE file in the source code distribution and can also be downloaded from http://www.opensource.org/licenses/PythonSoftFoundation.php.
Grig Gheorghiu
Email: <grig at gheorghiu dot net>
Web site: http://agiletesting.blogspot.com
The cheesecake.py module uses the following constants:
INDEX_PYPI_DOWNLOAD = 50 INDEX_PYPI_DISTANCE = 5 INDEX_URL_DOWNLOAD = 25 INDEX_UNPACK = 25 INDEX_UNPACK_DIR = 15 INDEX_INSTALL = 50 INDEX_FILE_CRITICAL = 15 INDEX_FILE = 10 INDEX_FILE_PYC = 20 INDEX_DIR_CRITICAL = 25 INDEX_DIR = 20 INDEX_DIR_EMPTY = 5 MAX_INDEX_DOCSTRINGS = 100 # max. percentage of modules/classes/methods/functions with docstrings MAX_INDEX_PYLINT = 100 # max. pylint score
Step 0
Initialize the Cheesecake index to 0. Also initialize to 0 the partial Cheesecake indexes for installability, documentation and code kwalitee.
Compute the maximum overall Cheesecake index that can be reached by any given package, which is the sum:
INDEX_PYPI_DOWNLOAD + INDEX_UNPACK + INDEX_UNPACK_DIR + INDEX_INSTALL + MAX_INDEX_DOCSTRINGS + MAX_INDEX_PYLINT + (INDEX_FILE * number_of_expected_files) + (INDEX_FILE_CRITICAL * number_of_expected_critical_files) + (INDEX_DIR * number_of_expected_dirs) + (INDEX_DIR_CRITICAL * number_of_expected_critical_dirs)
Compute the maximum Cheesecake index for installability, which is the sum:
INDEX_PYPI_DOWNLOAD + INDEX_UNPACK + INDEX_UNPACK_DIR + INDEX_INSTALL
Compute the maximum Cheesecake index for documentation, which is the sum:
(INDEX_FILE * number_of_expected_files) + (INDEX_FILE_CRITICAL * number_of_expected_critical_files) + (INDEX_DIR * number_of_expected_dirs) + (INDEX_DIR_CRITICAL * number_of_expected_critical_dirs) + MAX_INDEX_DOCSTRINGS
Compute the maximum Cheesecake index for code kwalitee, which is currently:
MAX_INDEX_PYLINT
Step 1a
If short name of the package was specified with -n or --name, try to download the package from the PyPI index page by following the links to the package home page and the package download URL (this is accomplished using setuptools utilities).
If not successful, exit with a Cheesecake index of 0. If successful and package was found at the Cheese Shop, add INDEX_PYPI_DOWNLOAD to the overall Cheesecake index and to the installability Cheesecake index.
If successful but package was not found at the Cheese Shop, add INDEX_PYPI_DOWNLOAD - (INDEX_PYPI_DISTANCE * number_of_links_to_package) to the overall Cheesecake index and to the installability Cheesecake index.
Step 1b
If full URL of the package was specified with -u or --url, try to download the package from the specified URL.
If not successful, exit with a Cheesecake index of 0. If successful, add INDEX_URL_DOWNLOAD to the overall Cheesecake index and to the installability Cheesecake index.
Step 1c
If path to package on local file system was specified with -p or --path, copy the package to the sandbox directory.
Step 2
Unpack the package (currently supported archive types are zip and tar.gz/tgz; in the near future we will support Python Eggs.)
If not successful, exit with a Cheesecake index of 0. If successful, add INDEX_UNPACK to the overall Cheesecake index and to the installability Cheesecake index.
Step 3
Check that the unpack directory has the same name as the package name (i.e. when unpacking twill-0.7.4.tar.gz, we expect the unpack directory to be twill-0.7.4.)
If the unpack directory name is the same as the package name, add INDEX_UNPACK_DIR to the overall Cheesecake index and to the installability Cheesecake index.
Step 4
Install the package to a temporary directory in a non-default location. If successful, add INDEX_INSTALL to the overall Cheesecake index and to the installability Cheesecake index.
Step 5
Check for existence of specific files. For each file found, add INDEX_FILE to the overall Cheesecake index and to the documentation Cheesecake index. If the file is deemed critical, add INDEX_FILE_CRITICAL instead.
The following special files ("cheese_files") are currently checked:
cheese_files = ["install", "changelog",
"news", "faq",
"todo", "thanks", "announce",
"ez_setup.py",
]
The following files are currently deemed critical:
critical_cheese_files = ["readme", "license", "setup.py"]
To check if a file FILE is among the cheese files, the following regular expression is used:
re.search(r"^%s(\.txt)*" % cheese_file, file, re.IGNORECASE)
Step 6
Check for existence of specific directories. For each directory found, add INDEX_DIR to the overall Cheesecake index and to the documentation Cheesecake index. If the directory is deemed critical, add INDEX_DIR_CRITICAL instead. If the directory is found empty, add INDEX_DIR_EMPTY instead.
The following directories ("cheese_dirs") are currently checked:
cheese_dirs = ["example", "demo"]
The following directories are currently deemed critical:
critical_cheese_dirs = ["doc", "test"]
To check if a directory DIR is among the cheese directories, the following regular expression is used:
re.search(r"^%s" % cheese_dir, DIR, re.ignorecase)
Step 7
Check for existence of .pyc files. If found, decrease the score by subtracting INDEX_FILE_PYC from the overall Cheesecake index and from the documentation Cheesecake index.
Step 8
Compute the percentage of modules/classes/methods/functions that have docstrings associated with them. Only Python modules that are not in test, doc, demo and example directories are checked. Round up the percentage and add it to the overall Cheesecake index and to the documentation Cheesecake index.
Step 9
If pylint is present on the system, run pylint against all Python files that are not in the test, docs or demo directories. Average the non-negative pylint scores, multiply the average by 10 and add it to the overall Cheesecake index and to the code kwalitee Cheesecake index.
Step 10
For each of the partial Cheesecake index types (installability, documentation and code kwalitee), display the absolute Cheesecake index for that type as the sum of all indexes of that type computed in the previous steps. Also display the relative Cheesecake index for that type as the percentage of (absolute_index / maximum_index).
Display the absolute Cheesecake index for the package as the sum of all indexes computed in the previous steps. Also display the relative Cheesecake index for the package as the percentage of (absolute_index / maximum_index).
$ python cheesecake.py -n Durus [cheesecake:console] Trying to download package durus from PyPI using setuptools utilities [cheesecake:console] Downloaded package Durus-3.1.tar.gz from http://www.mems-exchange.org/software/durus/Durus-3.1.tar.gz [cheesecake:console] Detailed info available in log file /tmp/cheesecake_sandbox/durus.log [cheesecake:console] A given package can currently reach a MAXIMUM number of 555 points [cheesecake:console] Starting computation of Cheesecake index for package 'Durus-3.1.tar.gz' [cheesecake:console] Starting computation of INSTALLABILITY index (max. points = 140) index_pypi_download ..................... 45 (downloaded package Durus-3.1.tar.gz following 1 link from PyPI) index_unpack ............................ 25 (package untar-ed successfully) index_unpack_dir ........................ 15 (unpack directory is Durus-3.1 as expected) index_install ........................... 50 (package installed in /tmp/cheesecake_sandbox/tmp_install_Durus-3.1) --------------------------------------------- INSTALLABILITY INDEX (ABSOLUTE) ......... 135 INSTALLABILITY INDEX (RELATIVE) ......... 96 (135 out of a maximum of 140 points is 96%) [cheesecake:console] Starting computation of DOCUMENTATION index (max. points = 415) index_file_announce ..................... 0 (file not found) index_file_changelog .................... 0 (file not found) index_file_ez_setup.py .................. 0 (file not found) index_file_faq .......................... 10 (file found) index_file_install ...................... 10 (file found) index_file_license ...................... 15 (critical file found) index_file_news ......................... 0 (file not found) index_file_readme ....................... 15 (critical file found) index_file_setup.py ..................... 15 (critical file found) index_file_thanks ....................... 0 (file not found) index_file_todo ......................... 0 (file not found) index_dir_demo .......................... 0 (directory not found) index_dir_doc ........................... 25 (critical directory found) index_dir_example ....................... 0 (directory not found) index_dir_test .......................... 25 (critical directory found) index_docstrings ........................ 42 (found 104/249=41.77% modules/classes/methods/functions with docstrings) --------------------------------------------- DOCUMENTATION INDEX (ABSOLUTE) .......... 157 DOCUMENTATION INDEX (RELATIVE) .......... 37 (157 out of a maximum of 415 points is 37%) [cheesecake:console] Starting computation of CODE KWALITEE index (max. points = 100) index_pylint ............................ 64 (average score is 6.30 out of 10) --------------------------------------------- CODE KWALITEE INDEX (ABSOLUTE) .......... 64 CODE KWALITEE INDEX (RELATIVE) .......... 64 (64 out of a maximum of 100 points is 64%) ============================================= OVERALL CHEESECAKE INDEX (ABSOLUTE) ..... 356 OVERALL CHEESECAKE INDEX (RELATIVE) ..... 64 (356 out of a maximum of 555 points is 64%)
Cheesecake is under very active development. The immediate goal is to add the unit test index measurement, followed by other metrics inspired from the kwalitee indicators. Please edit the IndexMeasurementIdeas Wiki page to add things that you would like to see covered by the Cheesecake metrics.