Task: Find the Most Recent Analysis of a Named Project with a Python Script

JavaScript is not currently enabled, but is required for full CodeSonar manual search and browse functionality.

If you are viewing this file in your hub's Web GUI, enable JavaScript in your browser: you will also need it for GUI functionality.

If you opened this file directly from disk, your browser may be directly suppressing JavaScript functionality: certain browsers perform this suppression on local files (but not files delivered by web servers) for security reasons.

If you access the manual through the hub's Web GUI, the functionality will not be suppressed because the hub is a web server.
Alternatively, your browser may allow you to explicitly disable the security setting that suppresses functionality. See the CodeSonar FAQ for more information.

CodeSonar^® 9.2p0

CONFIDENTIAL

CodeSecure Inc

Task: Find the Most Recent Analysis of a Named Project with a Python Script

You can use a script to find the most recent analysis of a named project.

This task provides a Python script that finds the most recent analysis of a project (specified by name), along with some suggestions for modifying the script to suit your needs.

For other scripting options, see:

Note: The Python script in this task is structured to be as similar as possible to those in the other download script tasks listed above.

Some Python-specific script authoring notes are provided below.
For a more 'Pythonic' download script example, see getlogs.py

Permissions Required
Other Requirements
The Download Script
Using the Script
Modifying the Script
Writing Other Download Scripts
Links

Permissions Required

The script requires that special user Anonymous has the following permissions for the project P of interest and its most recent analysis A.

See Modifying the Script for information on modifying the script to specify credentials for a non-Anonymous user with the required permissions.

Other Requirements

The example script imports standard Python modules argparse, csv, os, re, shutil, and subprocess.

You will need a Python installation to run the script. If you do not have a local installation, you can use the cspython shipped with CodeSonar:

Make sure that $CSONAR/codesonar/bin is in your PATH, where $CSONAR is the CodeSonar installation directory.
To run the script, invoke Python with cspython (rather than python as in the examples below).

Use the cURL shipped with CodeSonar: $CSONAR/third-party/curl/inst/bin/curl, where $CSONAR is the CodeSonar installation directory. Either:

add $CSONAR/third-party/curl/inst/bin/ to the front of your PATH,
or
edit the script so that the setting of curl_cmd specifies the full path to $CSONAR/third-party/curl/inst/bin/curl.

The Download Script

The following script will download the CSV project search results for the project (or projects) on the hub specified in the first argument to the script whose name matches the second argument. It will use these results to generate a summary CSV document that lists, for each project found:

The analysis ID of the project's last analysis, if it can be identified.
The project's project ID if the last analysis cannot be identified. (That is, either there are no analyses of the project or there are analyses but Anonymous does not have ANALYSIS_EXISTS permission for the last analysis.)

Both CSV files are stored in the directory specified in the third argument to the script.

import argparse
import csv
import os
import re
import shutil
import subprocess


# Find the most recent analysis of the named project and record its
# Analysis ID. Note that there may be multiple projects with the same
# name, in which case information for all of them will be recorded.
def find_last_analysis(hubaddr, project_name, savedir):
    curl_cmd = ['curl']

    project_search_csv = '{0}/project_search.csv?query=project%3D"{1}"&scope=all'.format(hubaddr, project_name)

    result_fname='overall_results.csv'
    search_fname='project_search_results.csv'

    search_cmdline = curl_cmd + ['-o',search_fname] + [project_search_csv]
    subprocess.check_call(search_cmdline)
    search_results=[]

    no_analysis = 'No most recent analysis found. Either: a) the project has '\
                  + 'no analyses or b) you do not have ANALYSIS_EXISTS '\
                  + 'permission for the most recent analysis.'
    num_results = 0
    with open(search_fname, 'r', newline='') as csvfile:
        search_results = csv.DictReader(csvfile)

        with open(result_fname, 'w', newline='') as outfile:
            fieldnames = ('project id',
                          'most recent analysis id',
                          'notes',)
            writer = csv.DictWriter(outfile, fieldnames=fieldnames)
            writer.writeheader()
            for row in search_results:
                result_match = re.search(r'.*/(?P<rtype>analysis|project)/(?P<rid>\d+)\.csv',
                                         row['url'])
                # If the last column of a result line is neither a project
                # URL nor an analysis URL, report a problem.
                if result_match is None:
                    print('Unexpected result {0} in {1}.'.format(str(row),
                                                                 result_fname))
                    continue

                num_results += 1
                # An analysis URL identifies the most recent analysis.
                if result_match.group('rtype')=='analysis':
                    writer.writerow(
                        {'most recent analysis id':result_match.group('rid')})
                    # A project URL indicates that the most recent analysis 
                    # of that project cannot be reported.
                else:
                    writer.writerow(
                        {'project id':result_match.group('rid'),
                         'notes':no_analysis})

    if num_results==0:
        # If no results, report and exit.
        print('No projects with URL-encoded name', project_name, 'were found.')
        print('This may indicate one or more of the following.')
        print('- The project name was not specified correctly (remember to URL-encode).') 
        print('- You do not have PROJECT_EXISTS permission for the specified project.')
        exit(1)

    print('Found {0} project(s) with URL-encoded name {1}.'.format(num_results,
                                                                   project_name))
    print('Your results are in {0}/{1}'.format(savedir,result_fname))
    

# Set up.
def go():
    parser = argparse.ArgumentParser(
        description=('Find the most recent analysis of '
                     + 'a project on a CodeSonar hub, '
                     + 'as specified by the command-line arguments.'))
        
    parser.add_argument("hub",
                        help="The hub URL.")
    parser.add_argument("pname",
                        help="The project  name; must be URL-encoded.")
    parser.add_argument("outdir",
                        help="The save directory.")
    args = parser.parse_args()

    allargs = (args.hub, args.pname, args.outdir)
    if not any([a is None for a in allargs]):
        if os.path.isdir(args.outdir):
            print('Output directory {0} exists, deleting and recreating.'.format(args.outdir))
            shutil.rmtree(args.outdir)
        os.mkdir(args.outdir)
        os.chdir(args.outdir)

        find_last_analysis(*allargs)

go()

This Python script works as follows.

It constructs a query string using the project search language to specify that
- the project name (field-name project) ...
- ... must case-insensitively match (=, which is URL-encoded as %3D ) ...
- ... the name specified in the second argument to the script.
It accesses the specified hub to download the CSV version of the Project Search Results page for the constructed query string, saving it in the specified output directory with basename project_search_results.csv.
This is the same file you would download if you did the following.
1. Navigate to the Home page in the web GUI.
2. Select projects on the hub from the search domain/scope menu.
3. Enter project="<pname>" in the search text box, where <pname> is the name of the project you want to find. (Do not URL-escape <pname>.)
4. Click the Search button.
  (A Project Search Results page will open to show the search results.)
5. Click the CSV link in the Project Search Results page to obtain the CSV version of the page.
You may like to try this now so you can see the following.
- How your search is transformed into a query string in the HTML Project Search Results page URL (in particular, note that URL-escaping has now been applied to <pname>).
- How the CSV file is formatted.
For each line in the CSV file (representing a project whose name is <pname>) except for the first (which is a header line), it does the following.
1. Inspects the last entry in the line (representing the URL corresponding to the result), extracting the following information.
  1. Whether it is for an Analysis page or a Project page.
    If the former, this is the Analysis page corresponding to the last analysis of the project. If the latter, no last analysis could be found and the hub has fallen back to linking to the project's Project page.
    For more information about CodeSonar URL schemes, see GUI Reference: Page Type Properties.
  2. The "ID" component of the URL: this is an Analysis ID if the URL is for an Analysis page; a Project ID if the URL is for a Project page.
2. Uses this extracted information to compose and write a line in the result CSV file, which is stored in the specified output directory with basename overall_results.csv.

Using the Script

To use this script with your hub, do the following.

Create a directory to save your result files in. The remainder of these instructions will refer to this directory as savedir.
Copy find_project_analysis.py to a suitable directory. The remainder of these instructions will refer to this directory as rundir.
This should not be anywhere under savedir, since the script starts by deleting everything in that directory.

Run the script:

cd rundir
python find_project_analysis.py protocol://host:port projname savepath

where

protocol	is the protocol for your hub: http or https.
host:port	is the location of your hub.
projname	is the name of the project whose last analysis you want to find. It must be URL-encoded.
savepath	is the path to the savedir directory you created in the first step.

The script will output information about each file it downloads.

Inspect the contents of savedir. It should contain two files.
- overall_results.csv: the result summary computed by the script.
- project_search_results.csv: the Project Search Results page downloaded from the hub.
Open project_search_results.csv to see your results.

Invocation Examples

Using the hub at http://[::1]:7341, find the most recent analysis of the project named "My Favorite Project", saving the results in directory /tmp/csvout:

python find_project_analysis.py http://[::1]:7341 My%20Favorite%20Project /tmp/csvout

Troubleshooting

Get more verbose output	For more verbose curl output, edit find_project_analysis.py so that curl is invoked with the -v flag. For example: curl_cmd=['curl', '-v']
No project_search_results.csv file	If your output directory does not contain file project_search_results.csv, there are two possible reasons. curl could not download the CSV file. The possible causes are as follows. It is trying to download a page that does not exist. Check to make sure that you can open the CSV file URL in a web browser. You may need to pass a different hub location argument to the script. You have an HTTPS-enabled hub with a self-signed hub server certificate. To instruct curl to accept self-signed certificates, edit find_project_analysis.py so that curl is invoked with the -k flag. For example: curl_cmd=['curl', '-k']
project_search_results.csv lists no results	If your output directory contains file project_search_results.csv but it does not list any results, there are two possibilities. There are no projects on the hub whose (URL-encoded) name matches the one specified. Check that you have specified the name correctly, and that it is URL-encoded. If necessary, try executing your search There are one or more such projects, but Anonymous does not have PROJECT_EXISTS permission for any of them. You will need to specify credentials for a user with the required permissions. Similarly, if project_search_results.csv lists fewer results than expected, it is likely to be because Anonymous does not have PROJECT_EXISTS permission for all matching projects.
overall_results.csv lists project IDs rather than analysis IDs	This indicates that Anonymous has PROJECT_EXISTS permission for the project(s) of interest, but does not have ANALYSIS_EXISTS permission for the most recent analysis of each. You will need to specify credentials for a user with the required permissions.

Modifying the Script

You may wish to make one or more of the following modifications.

Match project name substrings, rather than matching exactly
Provide credentials for a non-Anonymous user

Match project name substrings, rather than matching exactly

To change a search to perform substring matching rather than exact matching, we need to edit the corresponding field-condition to change its operator from = to :.
Because of URL-encoding, this becomes a change from %3D to %3A in the query string constructed by the script.

Edit the script to change the line setting project_search_csv to

project_search_csv = f'{hubaddr}/project/project_search.csv?query=project%3A"{project_name}"&scope=all'

Try running the edited script to see how substring matching affects the results.

Provide credentials for a non-Anonymous user

If your hub is configured so that special user Anonymous does not have the required permissions, you will need to edit the script to submit credentials for a suitable hub user account.

We recommend using bearer authentication. Alternative mechanisms are described in the table below.

For bearer authentication, do the following.

If you do not already have a suitable session and bearer token to use, generate them now.
1. Navigate to the User Sessions page for your selected hub user account.
2. Use the Create Session form to create a new session with a suitably long Expires setting.
  When you click Create Session, the page will be reloaded and the Bearer Token for your new session will be displayed at the top of the page.
  IMPORTANT: Make a note of the Bearer Token now. Once you refresh or navigate away from this page there will be no further opportunity to view the token.
3. Save the bearer token to a file. The remainder of these instructions will refer to this file as path/to/bearerfile.
Edit find_project_analysis.py to add the following setting before the curl_cmd setting.
```
with open('path/to/bearerfile','r') as bearerfile:
   bearer_token = bearerfile.read().strip()
```
where

path/to/bearerfile is the path to the file containing the bearer token you want to use.

path/to/bearerfile	is the path to the file containing the bearer token you want to use.

Edit find_project_analysis.py to modify the setting of curl_cmd:

curl_cmd=['curl', '-H', f"Authorization: Bearer {bearer_token}"]

For more information about bearer authentication in CodeSonar, see User Sessions and Anonymous Sessions: Bearer Authentication.

If you don't want to use bearer authentication, you can choose one of the options from the following table.

Certificate

If the hub is configured for certificate-based authentication, you can edit the script to specify a suitable user certificate.

If the account does not already have a user certificate and key, generate them now.

Edit find_project_analysis.py to add the following settings before the curl_cmd setting.

cert_path='path/to/usercert.pem'
key_path='path/to/certkey.pem'

where

path/to/usercert.pem	is the path to the hub user account's user certificate.
path/to/certkey.pem	is the path to the private key corresponding to the user certificate.

Edit find_project_analysis.py to modify the setting of curl_cmd:

curl_cmd=['curl', '-k', 
          '-X', 'POST',
          '-d', '"sif_sign_in=yes&sif_use_tls=yes&sif_log_out_competitor=yes"',
          '--cert', cert_path,
          '--key', key_path]

Omit the -k argument if your hub's hub server certificate is not self-signed.

Hard-Coded Username/Password

If you will be running the Python script under secure conditions, you may be willing to specify the account username and password directly in the Python script invocation.

For example, if your hub location is http://[::1]:7340 and the hub user account has username jean and password xyz123, the first argument to the Python script would be http://jean:xyz123@[::1]:7340.

Example: Use the hub user account with username jean and password xyz123 to authorize finding the most recent analysis of the project named "Project X " on the hub at http://[::1]:7340, saving the results in directory/tmp/csvout:

python find_project_analysis.py http://jean:xyz123@[::1]:7340 PROJECT%20X /tmp/csvout

Both username and password must also be URL-encoded.

Username/Password: Other

See the curl man page for alternative username/password authentication mechanisms.

See CodeSonar HTTP API: Authentication for more information on authentication strategies.

Writing Other Scripts

You can follow the overall structure of this script to create Python scripts that download other kinds of file from the hub.

In general, the process for constructing a script will be along the following lines.

Determine the GUI page type you are interested in.
Look at the the GUI reference page for your required page type to determine the following information.
- The page's URL or URL scheme: use this to construct the URL or URLs for your script to download.
- The alternative page output formats that are available: if you want to process pages in one of these formats rather than HTML, check that your required format is available and construct the download URLs accordingly.
- The RBAC permissions required to access the page and its contents. If special user Anonymous does not have these permissions, the Python script will need to provide authentication credentials for a hub user account that has the permissions.
If your script will be carrying out a task that modifies the hub database (adding, deleting, or modifying elements), inspect the GUI page HTML to look at the <form ... > element that provides access to the functionality you are interested in. Your script will need to include an HTTP POST request (via curl or similar) that corresponds to the one issued when the form is submitted.
Make sure your Python script includes the following elements.
- One or more download URLs, or a mechanism for constructing them.
- Some way of handling the downloaded URLs: one of the following.
  - The location to which the URLs should be downloaded, or a way to obtain the location.
  - An explicit redirect to the null location.
  - No specific handling, so that the downloaded files are all part of the script output.
- A curl invocation that acts on the the URL or URLs.
See the troubleshooting notes above if you encounter problems.
The Python script in this task is structured to be as similar as possible to those in the .sh and .bat download script examples. You may prefer to do one or more of the following to take advantage of Python libraries.
- Download pages as XML, and parse with xml.etree.ElementTree to extract required information. Note, however, that there are some cases where information that can be extracted from in the HTML version of a given GUI page type is not present in the XML version. For example, the Project ID of the analyzed project can be extracted from an HTML Analysis page but not from the XML (or CSV, or JSON) version of the same page.
- Use subprocess.Popen() instead of subprocess.call() to let multiple fetches occur in parallel. This can provide significant improvements in running time if the script is fetching a large number of pages.
- Use one of the following instead of invoking curl with subprocess methods.
  - urllib2 (Python 2) / urllib (Python 3)
  - requests (not in the Standard Library)
  - PycURL (not in the Standard Library)
- Have the script URL-encode script arguments, rather than requiring that the user encode them. For example, you can use urllib.urlencode() (Python 2) / urllib.parse.urlencode() (Python 3).

Links

Note. This page contains references to HTTP API documentation, which is served directly by the hub and cannot be accessed via a file:// URL. For active HTTP API documentation links, start a hub (if one is not already running), then open the manual from the hub.

To report problems with this documentation, please visit https://support.codesecure.com/.