Task: Download Warnings from an Analysis with a Python Script

JavaScript is not currently enabled, but is required for full CodeSonar manual search and browse functionality.

If you are viewing this file in your hub's Web GUI, enable JavaScript in your browser: you will also need it for GUI functionality.

If you opened this file directly from disk, your browser may be directly suppressing JavaScript functionality: certain browsers perform this suppression on local files (but not files delivered by web servers) for security reasons.

If you access the manual through the hub's Web GUI, the functionality will not be suppressed because the hub is a web server.
Alternatively, your browser may allow you to explicitly disable the security setting that suppresses functionality. See the CodeSonar FAQ for more information.

CodeSonar^® 9.2p0

CONFIDENTIAL

CodeSecure Inc

Task: Download Warnings from an Analysis with a Python Script

You can use a script to download web GUI files from the command line.

This task provides a Python script for downloading warning reports from a specified analysis, along with some suggestions for modifying the script to suit your needs.

For other scripting options, see:

Note: The Python script in this task is structured to be as similar as possible to those in the other download script tasks listed above.

Some Python-specific script authoring notes are provided below.
For a more 'Pythonic' download script example, see getlogs.py

If you do not need the warning reports and just want a list of warnings, use codesonar dump_warnings.py instead.

To get all warnings from a single analysis, you can use the /analysis/analysis_id-allwarnings.xml URL. For example, to download all warnings from the analysis with ID 5 from the hub at http://myhub:7340, use http://myhub:7340/analysis/5-allwarnings.xml.

Permissions Required
Other Requirements
The Download Script
Using the Script
Modifying the Script
Writing Other Download Scripts
Running Scripts Automatically
Links

Permissions Required

The script requires that special user Anonymous has the following permissions for the analysis A of interest.

See Modifying the Script for information on modifying the script to specify credentials for a non-Anonymous user with the required permissions.

Other Requirements

The example script imports standard Python modules argparse, csv, os, shutil, and subprocess.

You will need a Python installation to run the script. If you do not have a local installation, you can use the cspython shipped with CodeSonar:

Make sure that $CSONAR/codesonar/bin is in your PATH, where $CSONAR is the CodeSonar installation directory.
To run the script, invoke Python with cspython (rather than python as in the examples below).

Use the cURL shipped with CodeSonar: $CSONAR/third-party/curl/inst/bin/curl, where $CSONAR is the CodeSonar installation directory. Either:

add $CSONAR/third-party/curl/inst/bin/ to the front of your PATH,
or
edit the script so that the setting of curl_cmd specifies the full path to $CSONAR/third-party/curl/inst/bin/curl.

The Download Script

The following script will download the HTML warning reports for:

all warnings satisfying the default warning filter setting for the authorizing user (Anonymous, in this case),
from the analysis whose analysis ID matches the second argument passed to the script,
on the hub specified in the first argument to the script,

and store them under the directory specified in the third argument to the script.

import argparse
import csv
import os
import shutil
import subprocess

# Identify and download the warnings.
def download_warnings(hubaddr, analysis_id, savedir):
    curl_cmd = ['curl']

    def check_page(url):
        http_code = subprocess.check_output(curl_cmd
                                            + ['-w', '%{http_code}',
                                               '-o', os.devnull,
                                               url])
        return (int(http_code.strip())==200)

    analysis_csv_url = f'{hubaddr}/analysis/{analysis_id}.csv'
    out_csv = 'warnings.csv'

    if not check_page(analysis_csv_url):
        print(f'Could not access analysis page for analysis {analysis_id}')
        print('This may indicate one or more of the following.')
        print('- The analysis ID was not specified correctly.') 
        print('- You do not have ANALYSIS_READ permission for the analysis')
        exit(1)
    
    cmdline = curl_cmd + ['-o', out_csv,
                          analysis_csv_url]
    subprocess.check_call(cmdline)
    search_results=[]
    with open(out_csv, 'r', newline='') as csvfile:
        search_results = csv.DictReader(csvfile)

        firstrow=True
        for row in search_results:
            warning_rel = row['url'].replace('.txt','.html')
            warning_url = f'{hubaddr}{warning_rel}'
            if firstrow:
                if not check_page(warning_url):
                    print(f'Could not access Warning Report {warning_url}.')
                    print(f'Make sure you have ANALYSIS_WARNING_READ permission for analysis {analysis_id}.')
                    exit(1)
                firstrow=False
            dl_cmdline = curl_cmd + ['-o', os.path.basename(warning_rel),
                                     warning_url]
            subprocess.check_call(dl_cmdline)

        if firstrow: 
            # No results: report and exit.
            print('No warnings were found for analysis ID', analysis_id)
            print('This may indicate one or more of the following.')
            print('- The analysis ID was not specified correctly.') 
            print('- You do not have ANALYSIS_WARNING_EXISTS permission for the analysis.')
            print('- The analysis has no active warnings.')
            exit(1)


# Set up.
def go():
    parser = argparse.ArgumentParser(
        description=('Download warnings from an analysis on a CodeSonar hub, '
                     + 'as specified by the command-line arguments.'))
        
    parser.add_argument("hub",
                        help="The hub URL.")
    parser.add_argument("aid",
                        help="The analysis ID.")
    parser.add_argument("savedir",
                        help="The save directory.")
    args = parser.parse_args()

    allargs = (args.hub, args.aid, args.savedir)
    if not any([a is None for a in allargs]):
        if os.path.isdir(args.savedir):
            print(f'Output directory {args.savedir} exists, deleting and recreating.')
            shutil.rmtree(args.savedir)
        os.mkdir(args.savedir)
        os.chdir(args.savedir)

        download_warnings(*allargs)

go()

This Python script works as follows.

It accesses the specified hub to download the CSV version of the Analysis: Warnings page for the analysis with the specified analysis ID.
(This is the same file you would download if you navigated to the Analysis: Warnings page in the web GUI and clicked the CSV link. You may like to try this now so you can see how the file is formatted.)
For each line in the CSV file (representing a single warning) except for the first (which is a header line), it does the following.
1. Constructs the URL for the HTML warning report by taking the last entry in the line (representing the URL of the text version of the warning report) and replacing '.txt' at the end with '.html'.
2. Downloads the URL, storing it in the location under the specified save directory that corresponds to its path.

Using the Script

To use this script with your hub, do the following.

Create a directory to save your downloaded warnings in. The remainder of these instructions will refer to this directory as savedir.
Copy download_warnings.py to a suitable directory. The remainder of these instructions will refer to this directory as rundir.
This should not be anywhere under savedir, since the script starts by deleting everything in that directory.

Run the script:

cd rundir
python download_warnings.py protocol://host:port aid savepath

where

protocol	is the protocol for your hub: http or https.
host:port	is the location of your hub.
aid	is the analysis ID for the analysis whose warnings you wish to download. You can find the analysis ID: in the URL specified at the end build/analysis command output, which will be of the form http://hub_location/analysis/analysis_id.html, or in file path/to/projname.prj_files/aid.txt, where path/to/projname.prj_files/ is the analysis directory, or by running the following command, where path/to/projname.prj_files/ is the analysis directory codesonar analysis_id.py path/to/projname.prj_files/ or in the Analysis Details section of the corresponding Analysis page.
savepath	is the path to the savedir directory you created in the first step.

The script will output information about each HTML file it downloads.

Inspect the contents of savedir. It should contain a file called warnings.csv, along with all the HTML warning reports from your specified analysis.
Open one of the downloaded HTML files in a web browser to confirm that it contains a warning report.
Note that links to files that the script did not download (including images) will be broken: this is expected behavior.

Invocation Examples

Example 1: Using the hub at http://[::1]:7341, download warnings for the analysis with ID 3 and save in directory /tmp/mywarnings:
python download_warnings.py http://[::1]:7341 3 /tmp/mywarnings
Example 2: Using the hub at http://[::1]:7341, download warnings for the analysis whose analysis directory is /myprojects/projectX.prj_files/ and save in directory /tmp/mywarnings:
python download_warnings.py http://[::1]:7341 `codesonar analysis_id.py /myprojects/projectX.prj_files/` /tmp/mywarnings
- If it is not in your PATH, adjust the command to include the full path to your codesonar executable.
- On Cygwin, do the following instead.
  python download_warnings.py http://[::1]:7341 `codesonar analysis_id.py /myprojects/projectX.prj_files/ | tr -d '\r'` /tmp/mywarnings

Troubleshooting

Get more verbose output	For more verbose curl output, edit download_warnings.py so that curl is invoked with the -v flag. For example: curl_cmd=['curl', '-v']
No files downloaded	If the HTML warning reports are not present, check the command line output for information. If the only line of output is the URL of the Analysis: Warnings CSV file, this indicates that cURL did not attempt to download any warning reports. There are three possible reasons. curl could not download the CSV file. The possible causes are as follows. It is trying to download a page that does not exist. Check to make sure that you can open the CSV file URL in a web browser. You may need to pass different hub location or analysis ID arguments to the script. It is trying to download a page that Anonymous does not have permission to access. If Anonymous does not have access to the Analysis: Warnings page,You will need to specify credentials for a user with the required permissions. curl downloaded the CSV file, but the file did not list any warnings. The possible causes are as follows. The analysis does not have any warnings that satisfy the default warning filter setting for Anonymous (this includes the case where the analysis does not have any warnings at all). If necessary, edit the script to specify a different warning filter. The analysis does have warnings that satisfy the default warning filter setting for Anonymous, but Anonymous does not have ANALYSIS_WARNING_EXISTS permission for the analysis. You will need to specify credentials for a user with the required permissions. You have an HTTPS-enabled hub with a self-signed hub server certificate. To instruct curl to accept self-signed certificates, edit download_warnings.py so that curl is invoked with the -k flag. For example: curl_cmd=['curl', '-k']
Downloaded files contain "Permission Denied" messages	If there are downloaded HTML files but they contain "Permission Denied" messages rather than warning reports, this indicates that Anonymous does not have ANALYSIS_WARNING_READ permission for the analysis. You will need to specify credentials for a user with the required permissions.

Modifying the Script

You may wish to make one or more of the following modifications.

Download warnings that satisfy a specific filter
Download XML warning reports instead
Read analysis ID from analysis directory
Provide credentials for a non-Anonymous user

Download warnings that satisfy a specific filter

If an Analysis: Warnings URL is specified without a query string component, the default warning filter setting for the authorizing user (Anonymous, in this case) is applied. To apply a different visibility filter, the URL must include a query string that specifies a filter value.

For example, suppose we want to specify the all visibility filter.

Edit download_warnings.py so that the line setting analysis_csv_url is:
```
analysis_csv_url=f'{hub}/analysis/{analysis_id}.csv?filter=\"all\"'
```

Download XML warning reports instead

Warning reports can be output in text and XML formats as well as HTML. To download the XML versions, do the following.

Edit download_warnings.py so that the line setting warning_url is changed to:
```
warning_url = row['url'].replace('.txt','xml')
```
(That is, replace the occurrence of .html with .xml).
If you want to process the downloaded files automatically, the XML schema at warning_report.xsd is likely to be useful.

Read analysis ID from analysis directory

Instead of specifying the analysis ID on the command line, you can change the script to read the analysis directory from the command line and then read the most recent analysis ID from the analysis directory.

In function go(), replace line

parser.add_argument("aid", help="The analysis ID.")

with a line adding an adir argument.

parser.add_argument("adir", help="The analysis .prj_files directory.")

In function go(), change the definition of allargs to:
```
allargs = (args.hub, args.adir, args.savedir)
```

Modify the download_warnings() function header, and add two new lines to the very beginning of the function definition.

def download_warnings(hubaddr, analysis_dir, savedir):
    with open(os.path.join(analysis_dir,'aid.txt'),'r') as aidfile:
        analysis_id=aidfile.read()

When you invoke the script, specify the analysis directory as the second argument.
For example: using the hub at http://[::1]:7341, download warnings for the analysis whose analysis directory is /myprojects/projectX.prj_files/ and save in directory /tmp/mywarnings.

python download_warnings.py http://[::1]:7341 /myprojects/projectX.prj_files/ /tmp/mywarnings

Provide credentials for a non-Anonymous user

If your hub is configured so that special user Anonymous does not have the required permissions, you will need to edit the script to submit credentials for a suitable hub user account.

We recommend using bearer authentication. Alternative mechanisms are described in the table below.

For bearer authentication, do the following.

If you do not already have a suitable session and bearer token to use, generate them now.
1. Navigate to the User Sessions page for your selected hub user account.
2. Use the Create Session form to create a new session with a suitably long Expires setting.
  When you click Create Session, the page will be reloaded and the Bearer Token for your new session will be displayed at the top of the page.
  IMPORTANT: Make a note of the Bearer Token now. Once you refresh or navigate away from this page there will be no further opportunity to view the token.
3. Save the bearer token to a file. The remainder of these instructions will refer to this file as path/to/bearerfile.
Edit download_warnings.py to add the following setting before the curl_cmd setting.
```
with open('path/to/bearerfile','r') as bearerfile:
   bearer_token = bearerfile.read().strip()
```
where

path/to/bearerfile is the path to the file containing the bearer token you want to use.

path/to/bearerfile	is the path to the file containing the bearer token you want to use.

Edit download_warnings.py to modify the setting of curl_cmd:

curl_cmd=['curl', '-H', f"Authorization: Bearer {bearer_token}"]

For more information about bearer authentication in CodeSonar, see User Sessions and Anonymous Sessions: Bearer Authentication.

If you don't want to use bearer authentication, you can choose one of the options from the following table.

Certificate

If the hub is configured for certificate-based authentication, you can edit the script to specify a suitable user certificate.

If the account does not already have a user certificate and key, generate them now.

Edit download_warnings.py to add the following settings before the curl_cmd setting.

cert_path='path/to/usercert.pem'
key_path='path/to/certkey.pem'

where

path/to/usercert.pem	is the path to the hub user account's user certificate.
path/to/certkey.pem	is the path to the private key corresponding to the user certificate.

Edit download_warnings.py to modify the setting of curl_cmd:

curl_cmd=['curl', '-k', 
          '-X', 'POST',
          '-d', '"sif_sign_in=yes&sif_use_tls=yes&sif_log_out_competitor=yes"',
          '--cert', cert_path,
          '--key', key_path]

Omit the -k argument if your hub's hub server certificate is not self-signed.

Hard-Coded Username/Password

If you will be running the Python script under secure conditions, you may be willing to specify the account username and password directly in the script invocation.

For example, if your hub location is http://[::1]:7340 and the hub user account has username jean and password xyz123, the first argument to the script would be http://jean:xyz123@[::1]:7340.

Example: Use the hub user account with username jean and password xyz123 to authorize downloading warnings from the hub at http://[::1]:7340 for the analysis with ID 3, saving in directory /tmp/mywarnings:

python download_warnings.py http://jean:xyz123@[::1]:7340 3 /tmp/mywarnings

Both username and password must be URL-encoded.

Username/Password: Other

See the curl man page for alternative username/password authentication mechanisms.

See CodeSonar HTTP API: Authentication for more information on authentication strategies.

Writing Other Scripts

You can follow the overall structure of this script to create Python scripts that download other kinds of file from the hub.

In general, the process for constructing a script will be along the following lines.

Determine the GUI page type you are interested in.
Look at the the GUI reference page for your required page type to determine the following information.
- The page's URL or URL scheme: use this to construct the URL or URLs for your script to download.
- The alternative page output formats that are available: if you want to process pages in one of these formats rather than HTML, check that your required format is available and construct the download URLs accordingly.
- The RBAC permissions required to access the page and its contents. If special user Anonymous does not have these permissions, the Python script will need to provide authentication credentials for a hub user account that has the permissions.
If your script will be carrying out a task that modifies the hub database (adding, deleting, or modifying elements), inspect the GUI page HTML to look at the <form ... > element that provides access to the functionality you are interested in. Your script will need to include an HTTP POST request (via curl or similar) that corresponds to the one issued when the form is submitted.
Make sure your Python script includes the following elements.
- One or more download URLs, or a mechanism for constructing them.
- Some way of handling the downloaded URLs: one of the following.
  - The location to which the URLs should be downloaded, or a way to obtain the location.
  - An explicit redirect to the null location.
  - No specific handling, so that the downloaded files are all part of the script output.
- A curl invocation that acts on the the URL or URLs.
See the troubleshooting notes above if you encounter problems.
The Python script in this task is structured to be as similar as possible to those in the .sh and .bat download script examples. You may prefer to do one or more of the following to take advantage of Python libraries.
- Download pages as XML, and parse with xml.etree.ElementTree to extract required information. Note, however, that there are some cases where information that can be extracted from in the HTML version of a given GUI page type is not present in the XML version. For example, the Project ID of the analyzed project can be extracted from an HTML Analysis page but not from the XML (or CSV, or JSON) version of the same page.
- Use subprocess.Popen() instead of subprocess.call() to let multiple fetches occur in parallel. This can provide significant improvements in running time if the script is fetching a large number of pages.
- Use one of the following instead of invoking curl with subprocess methods.
  - urllib2 (Python 2) / urllib (Python 3)
  - requests (not in the Standard Library)
  - PycURL (not in the Standard Library)
- Have the script URL-encode script arguments, rather than requiring that the user encode them. For example, you can use urllib.urlencode() (Python 2) / urllib.parse.urlencode() (Python 3).

Running Scripts Automatically

You can use your system tools to arrange for the Python script to be run automatically.

For example, if you are using cron, add the following line to your crontab to run download_warnings.py at 2:05am every day, downloading from http://red:7341 the warnings issued by the analysis whose analysis directory is /home/projectX.prj_files/ and saving them in /tmp/mywarnings.

5 2 * * * python /path/to/download_warnings.py http://red:7341 `cat /home/projectX.prj_files/aid.txt` /tmp/mywarnings

Links

Note. This page contains references to HTTP API documentation, which is served directly by the hub and cannot be accessed via a file:// URL. For active HTTP API documentation links, start a hub (if one is not already running), then open the manual from the hub.

To report problems with this documentation, please visit https://support.codesecure.com/.