This tool is designed to semi-automate the detection of potential security features in Java software projects using a keyword-based approach.
First, you need to clone the repository you want to analyze locally.
-
Start by running the Security Features API Automated Detection Tool. Navigate to the tool's path:
cd security-feature-mining-study\SecurityFeatureMiningStudy\security-feature-localization
-
Then, run the following command:
java -jar target/security-feature-localization.jar locate "Cloned_Repo_Path" --mappings "security-feature-mining-study\Resources\lib-mappings"
-
You will find a JSON file named
features.json
generated in theresult
directory within the cloned repository. -
Copy that file into the main script directory
SecFeatFinder
.
Run the script with a GitHub repository URL. The tool will:
- Clone the repository into a local
repos/
folder, if it was not already cloned in the previous step. - Extract the project name from the URL.
- Ensure no duplicate cloning occurs.
This step is handled by the clone_repository()
function.
Security keywords are maintained in a structured JSON file named SecList.json
.
You can update the keywords for each category during the validation process.
After running the script, it provides initial insights such as:
- Total keyword matches.
- Number of affected files.
- Number of confidently identified features.
- Top 10 matched keywords.
These insights help prioritize and guide the manual validation phase.
In addition to annotations for potential security feature locations, a JSON file is generated with the same name as the analyzed repository, containing all detected positions.
The tool automatically generates annotations in a .feature-model
file at the root of the project.
- Uses generic tags like
Pos1
,Pos2
, ... - Compatible with the HAnS plugin in IntelliJ IDEA.
This enables fast, structured manual inspection of flagged code segments.
Identified security features should be organized into three categories:
Security_Features_Custom
– Developer-implemented features.Security_Features_Library
– Features from third-party libraries.Security_Features_Library_Tool
– Features detected by the API-call-based tool.
Each category is further organized by taxonomy category and subcategory.
If you identify security features from a library or framework that you want the tool to automatically recognize, you can add them to the security-feature-mining-study\Resources\lib-mappings\
directory.
Simply create a new JSON file using any existing one as a template.
- Ensure Python 3 is installed.
- The script is tailored exclusively for Java projects.
- Use the
.feature-model
output with the HAnS plugin for manual review.
This project is designed to analyze Java repositories for security feature usage. It is divided into four sub-modules:
The Repository Mining module mines Java repositories from GitHub and stores the relevant data in a PostgreSQL database.
Set the following environment variables before running this module:
DB_URL
- The URL to the PostgreSQL database.DB_USER
- The database user.DB_PASSWORD
- The password for the database user.GITHUB_OAUTH
- A GitHub OAuth token to access the GitHub API.
The Security Feature Localization module extracts security feature usage from Java codebases. It can be used as a library or via CLI commands.
-
Locate Features:
locate PROJECT_DIR --mappings MAPPINGS_DIR
This command generates a JSON file containing feature information in the project directory.
-
Annotate Source Code:
annotate PROJECT_DIR --mappings MAPPINGS_DIR
This command creates HAnS feature annotations directly within the source code files.
The Security Feature Mining module automatically downloads repositories gathered by the Repository Mining module, extracts security features, and stores the data in a PostgreSQL database.
Set the following environment variables before running this module:
DB_URL
- The URL to the PostgreSQL database.DB_USER
- The database user.DB_PASSWORD
- The password for the database user.
Additionally, Java, Maven and Gradle must be installed.
For more accurate results, it is recommended to download the most common Java JDK versions.
The JDK directories must follow the naming convention jdk-[version number]
to be included in the analysis.
The Metric Calculations module computes various metrics based on the security features extracted by the Security Feature Mining module.
Set the following environment variables before running this module:
DB_URL
- The URL to the PostgreSQL database.DB_USER
- The database user.DB_PASSWORD
- The password for the database user.
Additionally, a folder containing the JSON library metrics, named lib-mappings
, must be present in the current working directory.
- Make sure the required environment variables are properly set for each module.
- This project requires access to a PostgreSQL database and GitHub API (for Repository Mining).