VTechWorks

VTechWorks provides global access to Virginia Tech scholarship, including journal articles, books, theses, dissertations, conference papers, slide presentations, technical reports, working papers, administrative documents, videos, images, and more by faculty, students, and staff. Faculty can deposit items to VTechWorks from Elements, including journal articles covered by the University open access policy. Email vtechworks@vt.edu for help.

Recent Submissions

Pesticides & Pollinators: Poison in our Tributaries

Kafka, Bepe; Cox, Doug; Tejero, Miguel (New River Symposium, 2024-04-12)

A poster talking about the consequences of four pesticides in our water and with our pollinators: Atrazine, Chlorpyrifos, Glyphosate, and Neonicotenoids.

Virginia-Maryland College of Veterinary Medicine Strategic Plan 2020 - 2026

(Virginia Tech, 2020)

This Virginia-Maryland College of Veterinary Medicine strategic plan has been developed in alignment with Virginia Tech's long-term Beyond Boundaries future and the university's strategic plan — "The Virginia Tech Difference: Advancing Beyond Boundaries" — including its four strategic priorities: advance regional, national, and global impact; elevate the Ut Prosim (That I May Serve) difference; be a destination for talent; and ensure institutional excellence.

Team 2 : Search and Recommendation

Maheshwari, Ujjwal; Khandelwal, Aseem; Ram, Nikhil; Bhamidipati, Harsha; Banuelos, Jason (Virginia Tech, 2023-12-06)

Theses and dissertations represent significant bodies of work accomplished by others, often containing remarkable contributions. The advent of electronic theses and dissertations (ETDs) aimed to simplify the storage and accessibility of these documents. However, their true value is realized when accompanied by an effective system for searching and retrieving specific documents. Our project involved building an Information Retrieval System that supports searching, ranking, browsing and recommendations for a large collection of ETDs. We divided the main goal into two modules - Search and Recommendation. Search is accomplished using Elasticsearch. An overview of the tool is given in the report, along with goals and the implementation process. A recommendation module will provide relevant recommendations for a user, built by experimenting with multiple algorithms in order to obtain the best results. The user manual has been provided for the reference of other groups. The developer manual includes how the project was developed, including architecture, data flow, module overviews, etc. The final report provides an overview of the tasks undertaken, how we planned to achieve our goals, milestones and our timelines. By the project's conclusion, we successfully scaled the system to manage 500K ETDs. Our efforts resulted in enhancements, particularly in bulk indexing and achieving faster response times for searches. Additionally, we refined the existing index schema and implemented a logging mechanism within Elasticsearch to accommodate logs from all collaborating teams.

Team 5 - Infrastructure and DevOps Fall 2023

Adeyemi Aina; Amritha Subramanian; Hung-Wei Hsu; Shalini Rama; Vasundhara Gowrishankar; Yu-Chung Cheng (2024-01-17)

The project aims to revolutionize information retrieval from extensive academic repositories like theses and dissertations by developing an advanced system. Unlike conventional search engines, it focuses on handling complex academic documents. Six dedicated teams oversee different facets: Knowledge Graph, Search and Indexing, Object Detection and Topic Analysis, Language Models, Integration, and User Interaction. The infrastructure and DevOps team is responsible for integration, orchestrates collaborative efforts, manages database access, and ensures seamless communication among components via APIs. The team oversees the container utilization in the CI/CD pipeline, maintains the container cluster, and tailors APIs for specific team needs. Expressing gratitude for previous contributions, the team has made notable progress in migrating to Endeavour, establishing a robust CI/CD pipeline, updating the database schema, tackling Kafka challenges, and deploying authentication services while creating accessible filesystem and database APIs for other teams.

Team 3: Object Detection and Topic Modeling (Fall 2023)

Amr Ahmed Aboelnaga; Anushka Sivakumar; Jayanth Narla; Pradyumna Upendra Dasu; Ragul Seetharaman; Sahana Bhaskar; Shankar Srinidhi Srinivas (2024-01-08)

Under the guidance of Dr. Edward A. Fox, the CS 5604: Information Storage and Retrieval class (Fall 2023) was tasked with developing a cutting-edge information retrieval system to facilitate Electronic Theses and Dissertations (ETDs). We used learning algorithms on a large ETD collection to classify closely related documents. The project’s overarching objective is to enhance the already available service, which enables users to upload, search, and retrieve ETDs along with their associated digital objects in a human-readable format. Our team’s specific assignment is to use object detection and topic modeling to analyze documents and thereby assist in building a system that supports searching and retrieving documents using topics and user defined digital objects, and enables experimenters to conduct further research into objects and topics. To achieve this effort we have implemented object detection on 200 segmented ETDs and topic modeling using BERTopic (BERT embeddings) and LDA (Latent Dirichlet Allocation) on nearly 334k ETDs. The object detection and topic modeling pipelines have been modified to utilize APIs (Application Programming Interfaces) for populating database tables related to ETDs. Each ETD page is converted into an image and stored in the file system, with corresponding entries in the database. Additionally, all detected objects are stored both in the database and the file system. The generated XMLs now include an object ID for each detected object, facilitating the capture of structural relationships using knowledge graphs (Team 1). Efforts have also been invested in enhancing chapter segmentation in XMLs. This involves exploring and experimenting with the LLaMA 2 model, ResNet model, and clustering approaches to accurately identify the start and end pages of chapters.The topic modeling results using BERTopic were not satisfactory, leading to exploration of the LDA model. Switching to the LDA model has provided promising outputs. The topics generated using LDA were refined using various pre-processing techniques and given to team 6 to be used in the sign-up page, and to team 2 for indexing.

VTechWorks

Open Access Policy

Theses and Dissertations

Open Textbooks

Communities in VTechWorks

Recent Submissions