Classifier for GitHub Repos

Intro:

This repository features a deep learning classifier designed for the analysis of software repositories. The tool employs the ecore metamodel's 'type graph' in conjunction with a graph convolutional network. Presently, the classifier categorizes repositories into four distinct classes: Application, Framework, Library, and Plugin. It is important to note that the labels utilized by the tool are not mutually exclusive and are represented in a multi-hot encoded format.

Installation Instruction for Users:

Clone the repository by executing the following command:
git clone https://github.com/isselab/github-classifier.git
Open the cloned repository using your preferred Integrated Development Environment (IDE).
For the purposes of this instruction, we will assume the use of PyCharm from JetBrains.
Change the directory to data/input by running the following command:
cd ~/data/input
Clone the repositories you wish to analyze by executing:
git clone LINK_TO_REPO_YOU_WANT
run main.py

The default threshold for identification is set at 50%. If you wish to modify this threshold, please locate the relevant settings in the settings.py file. After making the necessary adjustments, ensure to rerun main.py to apply the changes.

Installation Instruction for Devs:

Basic Installation:

Clone the repository by executing the following command:
git clone https://github.com/isselab/github-classifier.git
Open the cloned repository using your preferred Integrated Development Environment (IDE).

Retraining:

Check data/labeled_dataset_repos.xlsx.
This xlsx file contains the labeled repository's the tool is trained with.
You may want to change it accordingly to your needs.
We strongly recommend utilizing a GPU for training purposes.
To verify GPU availability, please run the TorchGPUCheck.py script.
If you get the Result "Cuda is available!" you may proceed to step 3.
If the output indicates that "Cuda is not available," please follow the instructions provided in the terminal.
Additionally, refer to the guide in the Help section for further assistance in resolving any issues.
Run prepareDataset.py
Change the experiment_name in settings.py in the training section.
Run training.py

Expectation for Devs:

Recommended Workflow:

Create an issue in the GitHub issue page.
Open a branch named after the issue
Write code that fixes the issue
Write test code to be sure it works.
Comment your code well to be sure it can be understood.
Create a merge request

Known Problems / Limitations:

The Tool only processes Python files.
Dataset contains Python software repositories from GitHub, all with a dependency on at least one ML library.
Labels can not be changed easily, WIP

Help

Torch CUDA Guide, see "https://www.geeksforgeeks.org/how-to-set-up-and-run-cuda-operations-in-pytorch/"
GRaViTY tool for visualizing the metamodels, see "https://github.com/GRaViTY-Tool/gravity-tool?tab=readme-ov-file"

Name		Name	Last commit message	Last commit date
Latest commit History 508 Commits
.github/workflows		.github/workflows
classification_reports		classification_reports
data		data
doc		doc
log_files		log_files
mlruns/941106975359599461		mlruns/941106975359599461
plots		plots
tests		tests
.gitignore		.gitignore
AstToEcoreConverter.py		AstToEcoreConverter.py
Basic.ecore		Basic.ecore
CustomDataset.py		CustomDataset.py
DataformatUtils.py		DataformatUtils.py
EcoreToMatrixConverter.py		EcoreToMatrixConverter.py
EdgeAttributes.py		EdgeAttributes.py
Encoder.py		Encoder.py
GCN.py		GCN.py
NodeFeatures.py		NodeFeatures.py
Pipeline.py		Pipeline.py
README.md		README.md
TorchGPUCheck.py		TorchGPUCheck.py
graph_classification_model.pt		graph_classification_model.pt
main.py		main.py
pep8autoformat.py		pep8autoformat.py
prepareDataset.py		prepareDataset.py
requirements.txt		requirements.txt
settings.py		settings.py
skipped_files.log		skipped_files.log
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classifier for GitHub Repos

Table of Contents

Intro:

Installation Instruction for Users:

Installation Instruction for Devs:

Basic Installation:

Retraining:

Expectation for Devs:

Recommended Workflow:

Known Problems / Limitations:

Help

About

Releases

Packages

Contributors 6

Languages

isselab/github-classifier

Folders and files

Latest commit

History

Repository files navigation

Classifier for GitHub Repos

Table of Contents

Intro:

Installation Instruction for Users:

Installation Instruction for Devs:

Basic Installation:

Retraining:

Expectation for Devs:

Recommended Workflow:

Known Problems / Limitations:

Help

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages