- Intro
- Installation for Users
- Installation for Devs
- Expectation for Devs
- Known Problems / Limitations
- Help
This repository features a deep learning classifier designed for the analysis of software repositories. The tool employs the ecore metamodel's 'type graph' in conjunction with a graph convolutional network. Presently, the classifier categorizes repositories into four distinct classes: Application, Framework, Library, and Plugin. It is important to note that the labels utilized by the tool are not mutually exclusive and are represented in a multi-hot encoded format.
- Clone the repository by executing the following command:
git clone https://github.com/isselab/github-classifier.git
- Open the cloned repository using your preferred Integrated Development Environment (IDE).
For the purposes of this instruction, we will assume the use of PyCharm from JetBrains. - Change the directory to data/input by running the following command:
cd ~/data/input
- Clone the repositories you wish to analyze by executing:
git clone LINK_TO_REPO_YOU_WANT
- run main.py
The default threshold for identification is set at 50%. If you wish to modify this threshold, please locate the relevant settings in the settings.py file. After making the necessary adjustments, ensure to rerun main.py to apply the changes.
- Clone the repository by executing the following command:
git clone https://github.com/isselab/github-classifier.git
- Open the cloned repository using your preferred Integrated Development Environment (IDE).
- Check data/labeled_dataset_repos.xlsx.
This xlsx file contains the labeled repository's the tool is trained with.
You may want to change it accordingly to your needs. - We strongly recommend utilizing a GPU for training purposes.
To verify GPU availability, please run the TorchGPUCheck.py script.
If you get the Result "Cuda is available!" you may proceed to step 3.
If the output indicates that "Cuda is not available," please follow the instructions provided in the terminal.
Additionally, refer to the guide in the Help section for further assistance in resolving any issues. - Run prepareDataset.py
- Change the experiment_name in settings.py in the training section.
- Run training.py
- Create an issue in the GitHub issue page.
- Open a branch named after the issue
- Write code that fixes the issue
- Write test code to be sure it works.
- Comment your code well to be sure it can be understood.
- Create a merge request
- The Tool only processes Python files.
- Dataset contains Python software repositories from GitHub, all with a dependency on at least one ML library.
- Labels can not be changed easily, WIP
- Torch CUDA Guide, see "https://www.geeksforgeeks.org/how-to-set-up-and-run-cuda-operations-in-pytorch/"
- GRaViTY tool for visualizing the metamodels, see "https://github.com/GRaViTY-Tool/gravity-tool?tab=readme-ov-file"