PSBench

A comprehensive benchmark for estimating the accuracy of protein complex structural models (EMA)

I. Four datasets for training and testing EMA methods

PSBench consists of 4 complementary datasets:

1. CASP15_inhouse_dataset
1. CASP15_community_dataset
1. CASP16_inhouse_dataset
1. CASP16_community_dataset

For each of the four datasets, we provide 10 unique quality scores and a few AlphaFold features:

Category	Quality scores / features
Global Quality Scores	tmscore (4 variants), rmsd
Local Quality Scores	lddt
Interface Quality Scores	ics, ics_precision, ics_recall, ips, qs_global, qs_best, dockq_wave
Additional Input Features (CASP15_inhouse_dataset and CASP16_inhouse_dataset)	type, afm_confidence_score, af3_ranking_score, iptm, num_inter_pae, mpDockQ/pDockQ

For detailed explanations of each quality score and feature, please refer to Quality_Scores_Definitions

i. CASP15_inhouse_dataset

CASP15_inhouse_dataset consists of a total of 7,885 models generated by MULTICOM3 during the 2022 CASP15 competition.

ii. CASP15_community_dataset

CASP15_community_dataset consists of a total of 10,942 models generated by all the participating groups during the 2022 CASP15 competition.

iii. CASP16_inhouse_dataset

CASP16_inhouse_dataset consists of a total of 1,009,050 models generated by MULTICOM4 during the 2024 CASP16 competition.

iv. CASP16_community_dataset

CASP16_community_dataset consists of a total of 12,904 models generated by all the participating groups during the 2024 CASP16 competition.

II. Scripts to evaluate EMA methods on a benchmark dataset

generate various evlaution scores

III. Scripts to generate labels for a new benchmark dataset

Following are the prerequisites to generate the labels for new benchmark dataset:

Data:

Predicted structures
Native structure
Fasta file

Tools

Openstructure
USalign

Download the PSBench repository and cd into scripts

    git clone https://github.com/BioinfoMachineLearning/PSBench.git
    cd PSBench
    cd scripts

Openstructure Installation (Need to run only once)

docker pull registry.scicore.unibas.ch/schwede/openstructure:latest

Check the docker installation with

# should print the latest version of openstructure 
docker run -it registry.scicore.unibas.ch/schwede/openstructure:latest --version

Structure alignment and filtration (required for tmscore_usalign_aligned)

Requires 6 arguments:

-f : path to the fasta file for the target
-pp : path to the predicted pdbs directory for the target
-np : path to the native pdb file for the target
-o : path to the output directory
-tmp : path to the temporary directory
-c : path to the clustalw binary (available in tools/clustalw1.83/clustalw)

python filter_pdb.py --f /path/to/fasta_file -pp /path/to/predicted_pdbs_directory -np /path/to/native_pdb_file -o /path/to/output_directory -tmp /path/to/temporary_directory -c /path/to/clustalw_binary_file

Run openstructure (required for ics, ics_precision, ics_recall, ips, qs_global, qs_best, lddt, rmsd, dockq_wave, mmalign_tmscore)

Requires 3 arguments:

--indir : path to the folder containing predicted pdbs
--nativedir : path to the corresponding native pdb
--outdir : path to the output folder

python run_openstructure.py --indir /path/to/predicted_pdb_folder/ --nativedir /path/to/native_pdb_file --outdir /path/to/output_folder

Run USalign for original predicted structure and original native structure (required for tmscore_usalign)

Requires 4 arguments:

--indir : path to the folder containing original predicted pdbs
--nativedir : path to the corresponding original native pdb
--outdir : path to the output folder
--usalign_program : path to the USalign binary (available at tools/USalign)

python run_usalign.py --indir /path/to/predicted_pdb_folder/ --nativedir /path/to/native_pdb_file --outdir /path/to/output_folder --usalign_program /path/to/USalign_binary

Run USalign for filtered predicted structure and filtered native structure (required for tmscore_usalign_aligned)

Requires 4 arguments:

--indir : path to the folder containing filtered predicted pdbs
--nativedir : path to the corresponding filtered native pdb
--outdir : path to the output folder
--usalign_program : path to the USalign binary (available at tools/USalign)

python run_usalign.py --indir /path/to/predicted_pdb_folder/ --nativedir /path/to/native_pdb_file --outdir /path/to/output_folder --usalign_program /path/to/USalign_binary

Create a csv out of the results

Requires 5 arguments:

-pp : path to the predicted pdbs directory for the target
-os : path to the openstructure results for the target
-tm_u : path to the tmscore_usalign results for the target
-tm_ua : path to the tmscore_usalign_aligned results for the target
-oc : path where the output csv is to be saved

python create_csv.py -pp /path/to/predicted_pdbs_directory -os /path/to/openstructure_results_directory/ -tm_u /path/to/tmscore_usalign_results_directory -tm_ua /path/to/tmscore_usalign_aligned_results_directory -oc /path/to/output_csv_file

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Datasets		Datasets
scripts		scripts
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PSBench

I. Four datasets for training and testing EMA methods

i. CASP15_inhouse_dataset

ii. CASP15_community_dataset

iii. CASP16_inhouse_dataset

iv. CASP16_community_dataset

II. Scripts to evaluate EMA methods on a benchmark dataset

III. Scripts to generate labels for a new benchmark dataset

Data:

Tools

Openstructure Installation (Need to run only once)

Structure alignment and filtration (required for tmscore_usalign_aligned)

Run openstructure (required for ics, ics_precision, ics_recall, ips, qs_global, qs_best, lddt, rmsd, dockq_wave, mmalign_tmscore)

Run USalign for original predicted structure and original native structure (required for tmscore_usalign)

Run USalign for filtered predicted structure and filtered native structure (required for tmscore_usalign_aligned)

Create a csv out of the results

IV. Baseline EMA methods for comparison with a new EMA method

Reference

About

Releases

Packages

Languages

License

BioinfoMachineLearning/PSBench

Folders and files

Latest commit

History

Repository files navigation

PSBench

I. Four datasets for training and testing EMA methods

i. CASP15_inhouse_dataset

ii. CASP15_community_dataset

iii. CASP16_inhouse_dataset

iv. CASP16_community_dataset

II. Scripts to evaluate EMA methods on a benchmark dataset

III. Scripts to generate labels for a new benchmark dataset

Data:

Tools

Openstructure Installation (Need to run only once)

Structure alignment and filtration (required for tmscore_usalign_aligned)

Run openstructure (required for ics, ics_precision, ics_recall, ips, qs_global, qs_best, lddt, rmsd, dockq_wave, mmalign_tmscore)

Run USalign for original predicted structure and original native structure (required for tmscore_usalign)

Run USalign for filtered predicted structure and filtered native structure (required for tmscore_usalign_aligned)

Create a csv out of the results

IV. Baseline EMA methods for comparison with a new EMA method

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages