Project Name

A Multi-Model Approach to Relation Extraction: BERT Embeddings with XGBoost and Graph Neural Networks

About the Project

This project focuses on Relation Extraction, comparing two major approaches: a traditional machine learning model and a graph-based neural network. We implement XGBoost as our traditional model and Graph Convolutional Networks (GCN) as our graph-based approach, utilizing the re-DOCRED dataset. For both models, the input data is pre-processed and converted into BERT embeddings to capture rich contextual features. The goal is to evaluate and compare the performance of these distinct methodologies in extracting relationships from text data.

Built With

Getting Started

To get a local copy up and running, follow these steps:

Prerequisites

Python 3.12 or more installed
pip (Python package manager)
conda (optional)

Installation

Create a virtual environment (optional but recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install the required dependencies

pip install -r requirements.txt

Please note that for importing the torch-cluster library, Visual C++ libraries should be already installed (Visual Studio Build Tools: You’ll need a C++ compiler, and Microsoft’s Visual Studio Build Tools is the standard choice)

Minimum components: MSVC C++ build tools,Windows SDK

Contact

Aishwarya Kulkarni
Vaibhav Parihar
Shreya Varghese
Chirag Tolani

Other Remarks

Please refer to the mentioned link for the dataset utilized for this project: https://github.com/tonytan48/Re-DocRED

Kindly note that the following folder and file structure of our project:

requirements.txt - This file contains all the required libraries and functions that need to installed and imported into the notebook files. Installation guide for these libraries are detailed in the Installation section
bert_embeddings - These are the extracted BERT embeddings that are saved during the pre-processing step for both the models
XGB_model_and_embeddings - This folder contains the model file saved during the training loop (.pkl file) as well as the optimal thresholds utilized to improve the XGBOOST model performance
GCN_model_and_embeddings - This folder contains the model file saved during the training loop (.pkl file) as well as the optimal thresholds utilized to improve the GCN model performance
bert_with_xgboost.ipynb - This notebook contains the preprocessing of the dataset into BERT embeddings as well as the training code for the XGBoost model
bert_with_gcn.ipynb - This notebook contains the preprocessing of the dataset into BERT embeddings as well as the training code for the GCN model
xgboost_inference.ipynb - This notebook contains the code to perform real time inference on XGBoost model and predict the relations for the provided user input
gcn_inference.ipynb - This notebook contains the code to perform real time inference on GCN model and predict the relations for the provided user input
rel_info.json - contains the relation mapping required for real time inference of both models

Please note that the same zip file contains three main dataset files—train_revised.json, test_revised.json, and dev_revised.json—which are the training data, testing data, and validation data, respectively.

We have also included a short research paper (RE-paper-final.pdf) detailing the approaches applied to the dataset for relation extraction, a discussion of related work, the methodology applied to both approaches and an evaluation of the performance of both approaches. The paper also provides a comparative analysis of both the models and limitations faced during the implementation phase.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project Name

About the Project

Built With

Getting Started

Prerequisites

Installation

Contact

Other Remarks

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
GCN_model_and_embeddings		GCN_model_and_embeddings
XGB_model_and_embeddings		XGB_model_and_embeddings
bert_embeddings		bert_embeddings
RE-paper-final.pdf		RE-paper-final.pdf
README.md		README.md
bert_with_gcn.ipynb		bert_with_gcn.ipynb
bert_with_xgboost.ipynb		bert_with_xgboost.ipynb
dev_revised.json		dev_revised.json
gcn_inference.ipynb		gcn_inference.ipynb
rel_info.json		rel_info.json
requirements.txt		requirements.txt
test_revised.json		test_revised.json
train_revised.json		train_revised.json
xgboost_inference.ipynb		xgboost_inference.ipynb

chiragtolani/DocRed-RE

Folders and files

Latest commit

History

Repository files navigation

Project Name

About the Project

Built With

Getting Started

Prerequisites

Installation

Contact

Other Remarks

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages