Claim-Evidence Relationships Transformers for NLP

Data Analytics | Machine Learning | Classification | ROBERTA | Ensembles

Preamble

This project was completed as part of the Machine Learning module at SMU. The objective was to train a model which can determine the relationship between sentences. As of 2022, the state-of-the-art method of a determining such relationships was via transformers built on the attention model. Basically by using various queries, keys and values to encode a sentence we are able to extract different types of features be it topological or lexicographical.

The training data is basically a list of sentence pairs, one called an evidence and the other a claim. Each pair is labeled as having a 'supportive', 'refutive' or 'none' relationship. In the transformer model each sentence is encoded and the encodings cross multiplied to create a unique matrix which is then used to train the model as it identifies the matrix as belonging to one of the labels.

The report explaining the project is as seen below. Most of the exploration and writing was completed by me, however the base code that sets up the ROBERTA transformer model for training was adapted from https://github.com/teacherpeterpan/Zero-shot-Fact-Verification.

The contents of the notebook appended below dive straight into the first of 6 stages:

  1. Data Extraction
  2. Sentence Similarity Exploration
  3. Named Entity Recognition Exploration
  4. Data Formatting for Transformer Model (ZSFV)
  5. Integrating Named Entity Recognition into Model
  6. Ensembling Experiments

Extract Data

Functions for Sentence Similarity

Check Similarity for Unrelated Claim and Evidence

Named Entity Recognition

Intention is to extract entity names from claims and evidence and bypass overall similarity checks when there are similar entities in both claims and evidence.

en_core_sci_sm

en_core_sci_lg

en_core_sci_scibert

NER Characteristics for Claim and Evidence

Thoughts

There could be words or character combinations (e.g. chemical compounds) that are not in the embeddings, in such cases we need a way to remove non-entity words from the sentence and check for the entity names in both the claim and evidence

There should be a smarter way (than just looking for a word match) for checking entities between claim and evidence to determine if it is a match or not. Fuzzy matching for each word pair? Or just apply the Cosine Similarity on a merge of all the entities?

Formatting Data for ZSFV

ZSFV Model Training and Evaulation Command

python run_hover.py --model_type roberta --model_name_or_path roberta-large --do_train --do_lower_case --per_gpu_train_batch_size 16 --learning_rate 1e-5 --num_train_epochs 5.0 --evaluate_during_training --max_seq_length 200 --max_query_length 60 --gradient_accumulation_steps 2 --max_steps 60 --save_steps 60 --logging_steps 60 --overwrite_cache --num_labels 3 --data_dir ../data/ --train_file project_train_data.json --predict_file test_phase_1_update.json --output_dir ./output/roberta_zero_shot

python run_hover.py --model_type roberta --model_name_or_path roberta-large --do_eval --do_lower_case --per_gpu_train_batch_size 16 --learning_rate 1e-5 --num_train_epochs 5.0 --evaluate_during_training --max_seq_length 200 --max_query_length 60 --gradient_accumulation_steps 2 --max_steps 20000 --save_steps 1000 --logging_steps 1000 --overwrite_cache --num_labels 3 --data_dir ../data/ --train_file fever_train_data.json --predict_file test_phase_1_update_human.json --output_dir ./tuned_model_lr1e5_bs16_s75(5)/roberta_zero_shot

python run_hover.py --model_type roberta --model_name_or_path roberta-large --do_eval --do_lower_case --per_gpu_train_batch_size 16 --learning_rate 1e-5 --num_train_epochs 5.0 --evaluate_during_training --max_seq_length 200 --max_query_length 60 --gradient_accumulation_steps 2 --max_steps 20000 --save_steps 1000 --logging_steps 1000 --overwrite_cache --num_labels 3 --data_dir ../data/ --train_file fever_train_data.json --predict_file test_phase_2_update.json --output_dir ./scifact_model3(91.33)/roberta_zero_shot

python run_hover.py --model_type roberta --model_name_or_path roberta-large --do_eval --do_lower_case --per_gpu_train_batch_size 16 --learning_rate 1e-5 --num_train_epochs 5.0 --evaluate_during_training --max_seq_length 200 --max_query_length 60 --gradient_accumulation_steps 2 --max_steps 20000 --save_steps 1000 --logging_steps 1000 --overwrite_cache --num_labels 3 --data_dir ../data/ --train_file fever_train_data.json --predict_file test_phase_1_update_human2.json --output_dir ./tuned_model_lr1e5_bs16_s75(5)/roberta_zero_shot

python run_hover.py --model_type roberta --model_name_or_path ./tuned_model_lr1e5_bs16_s75(5)/roberta_zero_shot/best_model --do_train --do_lower_case --per_gpu_train_batch_size 16 --learning_rate 1e-5 --num_train_epochs 5.0 --evaluate_during_training --max_seq_length 200 --max_query_length 60 --gradient_accumulation_steps 2 --max_steps 100 --save_steps 100 --logging_steps 100 --overwrite_cache --num_labels 3 --data_dir ../data/ --train_file scifact_train_dev.json --predict_file test_phase_1_update_human_2.json --output_dir ./output/roberta_zero_shot

python run_hover.py --model_type roberta --model_name_or_path ./sample_model_3/roberta_zero_shot/best_model --do_train --do_lower_case --per_gpu_train_batch_size 16 --learning_rate 1e-5 --num_train_epochs 5.0 --evaluate_during_training --max_seq_length 200 --max_query_length 60 --gradient_accumulation_steps 2 --max_steps 1000 --save_steps 100 --logging_steps 100 --overwrite_cache --num_labels 3 --data_dir ../data/ --train_file scifact_train_dev_sampled.json --predict_file test_phase_1_update_human_2.json --output_dir ./output/roberta_zero_shot

python run_hover2.py --model_type bert --model_name_or_path microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext --do_train --do_lower_case --per_gpu_train_batch_size 16 --learning_rate 1e-5 --num_train_epochs 5.0 --evaluate_during_training --max_seq_length 200 --max_query_length 60 --gradient_accumulation_steps 1 --save_steps 60 --logging_steps 60 --overwrite_cache --num_labels 3 --data_dir ../data/ --train_file scifact_train_dev_sampled.json --predict_file scifact_train_dev.json --output_dir ./output/roberta_zero_shot

python run_hover2.py --model_type bert --model_name_or_path microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext --do_eval --do_lower_case --per_gpu_train_batch_size 16 --learning_rate 1e-5 --num_train_epochs 5.0 --evaluate_during_training --max_seq_length 200 --max_query_length 60 --gradient_accumulation_steps 1 --save_steps 60 --logging_steps 60 --overwrite_cache --num_labels 3 --data_dir ../data/ --train_file scifact_train_dev_sampled.json --predict_file test_phase_2_update.json --output_dir ./pubmed_tuned_model_40ep_bs16_lr1e5/roberta_zero_shot

Capture.JPG

Data Formatting

Force Label Unrelated Claim-Evidence

Based on absence of shared entity presence in claim-evidence. Forced labels are stored in column 'related'. Amongst the related claim-evidence there is still a need to differentiate between support and refute.

Convert JSON Prediction File to txt

NER Enhancements to Prediction

Ensembling

Ensemble (3 Models)

Third model degrades performance do not include in ensemble

Weighted Ensemble (Phase 1 Final)

Final

Weighted Ensemble (Phase 2 First Submission)

Final

Weighted Ensemble (Phase 2 Second Submission)

Final results, a maximum of 228 changes, weights tuned to allow 186