wenkai commited on
Commit
3044fa8
1 Parent(s): b7b6da7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +77 -75
README.md CHANGED
@@ -1,75 +1,77 @@
1
- ## Introduction
2
- <p align="center">
3
- <br>
4
- <img src="assets/FAPM.png"/>
5
- <br>
6
- <p>
7
-
8
- ## Installation
9
-
10
- 1. (Optional) Creating conda environment
11
-
12
- ```bash
13
- conda create -n lavis python=3.8
14
- conda activate lavis
15
- ```
16
-
17
- 2. for development, you may build from source
18
-
19
- ```bash
20
- git clone https://github.com/xiangwenkai/FAPM.git
21
- cd FAPM
22
- pip install -e .
23
-
24
- pip install Biopython
25
- pip install fair-esm
26
- ```
27
-
28
- ### Datasets
29
- #### 1.raw dataset
30
- Raw data are avaliable at *https://ftp.uniprot.org/pub/databases/uniprot/previous_releases/release-2023_04/knowledgebase/*, this file is very large and need to be processed to get its name, sequence, GO label, function description and prompt.
31
- The domain level protein dataset we used are avaliable at *https://ftp.ebi.ac.uk/pub/databases/interpro/releases/95.0/protein2ipr.dat.gz*
32
- In this respository, We provide the experimental train/val/test sets of Swiss-Prot, which are avaliable at data/swissprot_exp
33
- #### 2.ESM2 embeddings
34
- Source code for ESM2 embeddings generation: *https://github.com/facebookresearch/esm*
35
- The generation command:
36
- ```bash
37
- python esm_scripts/extract.py esm2_t33_3B_UR50D you_path/protein.fasta you_path_to_save_embedding_files --repr_layers 36 --truncation_seq_length 1024 --include per_tok
38
- ```
39
- The default path to save embedding files in this respository is **data/emb_esm2_3b**
40
-
41
- ## Pretraining language models
42
- Source: *https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B*
43
-
44
- ## Training
45
- data config: lavis/configs/datasets/protein/GO_defaults_cap.yaml
46
- stage1 config: lavis/projects/blip2/train/protein_pretrain_stage1.yaml
47
- stage1 training command: run_scripts/blip2/train/protein_pretrain_domain_stage1.sh
48
- stage2 config: lavis/projects/blip2/train/protein_pretrain_stage2.yaml
49
- stage2 training/finetuning command: run_scripts/blip2/train/protein_pretrain_domain_stage2.sh
50
-
51
- ## Trained models
52
- You can download our trained models from drive: *https://drive.google.com/drive/folders/1aA0eSYxNw3DvrU5GU1Cu-4q2kIxxAGSE?usp=drive_link*
53
-
54
- ## Testing
55
- config: lavis/projects/blip2/eval/caption_protein_eval.yaml
56
- command: run_scripts/blip2/eval/eval_cap_protein.sh
57
-
58
- ## Inference example
59
- ```
60
- python FAPM_inference.py \
61
- --model_path model/checkpoint_mf2.pth \
62
- --example_path data/emb_esm2_3b/P18281.pt \
63
- --device cuda \
64
- --prompt Acanthamoeba
65
- ```
66
-
67
-
68
-
69
-
70
-
71
-
72
-
73
-
74
-
75
-
 
 
 
1
+ ## Introduction
2
+ <p align="center">
3
+ <br>
4
+ <img src="assets/FAPM.png"/>
5
+ <br>
6
+ <p>
7
+ Huggingface repo: *https://huggingface.co/wenkai/FAPM*
8
+
9
+ ## Installation
10
+
11
+ 1. (Optional) Creating conda environment
12
+
13
+ ```bash
14
+ conda create -n lavis python=3.8
15
+ conda activate lavis
16
+ ```
17
+
18
+ 2. for development, you may build from source
19
+
20
+ ```bash
21
+ git clone https://github.com/xiangwenkai/FAPM.git
22
+ cd FAPM
23
+ pip install -e .
24
+
25
+ pip install Biopython
26
+ pip install fair-esm
27
+ ```
28
+
29
+ ### Datasets
30
+ #### 1.raw dataset
31
+ Raw data are avaliable at *https://ftp.uniprot.org/pub/databases/uniprot/previous_releases/release-2023_04/knowledgebase/*, this file is very large and need to be processed to get its name, sequence, GO label, function description and prompt.
32
+ The domain level protein dataset we used are avaliable at *https://ftp.ebi.ac.uk/pub/databases/interpro/releases/95.0/protein2ipr.dat.gz*
33
+ In this respository, We provide the experimental train/val/test sets of Swiss-Prot, which are avaliable at data/swissprot_exp
34
+ #### 2.ESM2 embeddings
35
+ Source code for ESM2 embeddings generation: *https://github.com/facebookresearch/esm*
36
+ The generation command:
37
+ ```bash
38
+ python esm_scripts/extract.py esm2_t33_3B_UR50D you_path/protein.fasta you_path_to_save_embedding_files --repr_layers 36 --truncation_seq_length 1024 --include per_tok
39
+ ```
40
+ The default path to save embedding files in this respository is **data/emb_esm2_3b**
41
+
42
+ ## Pretraining language models
43
+ Source: *https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B*
44
+
45
+ ## Training
46
+ data config: lavis/configs/datasets/protein/GO_defaults_cap.yaml
47
+ stage1 config: lavis/projects/blip2/train/protein_pretrain_stage1.yaml
48
+ stage1 training command: run_scripts/blip2/train/protein_pretrain_domain_stage1.sh
49
+ stage2 config: lavis/projects/blip2/train/protein_pretrain_stage2.yaml
50
+ stage2 training/finetuning command: run_scripts/blip2/train/protein_pretrain_domain_stage2.sh
51
+
52
+ ## Trained models
53
+ The models are avaliable at **https://huggingface.co/wenkai/FAPM/tree/main/model**
54
+ You can also download our trained models from google drive: *https://drive.google.com/drive/folders/1aA0eSYxNw3DvrU5GU1Cu-4q2kIxxAGSE?usp=drive_link*
55
+
56
+ ## Testing
57
+ config: lavis/projects/blip2/eval/caption_protein_eval.yaml
58
+ command: run_scripts/blip2/eval/eval_cap_protein.sh
59
+
60
+ ## Inference example
61
+ ```
62
+ python FAPM_inference.py \
63
+ --model_path model/checkpoint_mf2.pth \
64
+ --example_path data/emb_esm2_3b/P18281.pt \
65
+ --device cuda \
66
+ --prompt Acanthamoeba
67
+ ```
68
+
69
+
70
+
71
+
72
+
73
+
74
+
75
+
76
+
77
+