Skip to content

WMC001/QuestionClassifier

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

50 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Question Classifier

Git repo: Question Classifier

pipeline

tokenization->word embedding->sentence vector->training the classifier

folder structure

.
β”œβ”€β”€ README.md
β”œβ”€β”€ data
β”‚Β Β  β”œβ”€β”€ config
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ xxx.config
β”‚Β Β  β”œβ”€β”€ dev.txt
β”‚Β Β  β”œβ”€β”€ glove.small.txt
β”‚Β Β  β”œβ”€β”€ labels.txt
β”‚Β Β  β”œβ”€β”€ raw_data.txt
β”‚Β Β  β”œβ”€β”€ stopword.txt
β”‚Β Β  β”œβ”€β”€ train.txt
β”‚Β Β  β”œβ”€β”€ trec.txt
β”‚Β Β  └── vocabulary.txt
β”œβ”€β”€ document
β”‚Β Β  β”œβ”€β”€ README.md
β”‚Β Β  β”œβ”€β”€ document.md
β”‚Β Β  └── document.pdf
β”œβ”€β”€ src
β”‚Β Β  β”œβ”€β”€ classifier
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”‚Β Β  └── network.py
β”‚Β Β  β”œβ”€β”€ config.ini
β”‚Β Β  β”œβ”€β”€ config.py
β”‚Β Β  β”œβ”€β”€ dataloader.py
β”‚Β Β  β”œβ”€β”€ model.py
β”‚Β Β  β”œβ”€β”€ question_classifier.py
β”‚Β Β  β”œβ”€β”€ sentVect
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ bow.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ bow_bilstm.py
β”‚Β Β  β”‚Β Β  └── mybilstm.py
β”‚Β Β  └── utils
β”‚Β Β      β”œβ”€β”€ __init__.py
β”‚Β Β      β”œβ”€β”€ file_preload.py
β”‚Β Β      └── preprocess.py
└──

commit msg

[your task]: what you did in this commit

e.g.: 'wordEmbedding: word2vec model initialize'

...

environment

Developing and testing environment: macOS10.15.7, Anaconda python3.8, with 8-gen Core i5 CPU and 16GB RAM.

Training set: 5500-labeled questions

Testing set: TREC 10

run

mkdir data/models
cd src

Preprocess data using --preprocess flag. Please make sure preprocessing has been done before running training.

python3 question_classifier.py --preprocess --config [config-file-path]

dev training mode: Leaving 10% of training set out as validation set

python3 question_classifier.py --dev --config [config-file-path]

training mode: Train the model with the whole dataset

python3 question_classifier.py --train --config [config-file-path]

test mode: Read an existing model and test it on TREC 10 dataset

python3 question_classifier.py --test --config [config-file-path]

About

question classifier over a 5500-labeled question dataset

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%