Enhancing Aircraft Type Matching Across Data Sources Using Entity Resolution

Bryan, Karna

Enhancing Aircraft Type Matching Across Data Sources Using Entity Resolution

Files

Bryan_updated.pdf (2.31 MB)

Authors

Bryan, Karna

Issue Date

2026-03

Type

Dissertation

Language

en

Keywords

deep learning , aviation safety , entity resolution , Business, Engineering, Science, & Technological Innovation

Abstract

This study addresses the challenge of merging aviation safety data from diverse sources that lack shared identifiers, which limits the ability to combine safety events and utilization information for risk assessment. Inconsistent aircraft type representations make it difficult to align records across systems at scale. The purpose of this constructive research study was to develop and evaluate an aircraft type entity matching capability that supports normalization to a common aircraft type reference representation. A manually curated gold dataset was constructed by mapping aircraft type taxonomies from the Federal Aviation Administration and the International Civil Aviation Organization. Using this labeled dataset, a deterministic baseline matcher was implemented using feature similarity scoring and thresholding. Then, a deep learning matcher was trained using a transformer-based pairwise classification approach that serializes record fields into a textual representation. Experiments evaluated negative sampling strategies, schema variation across sources, and multi-source union training. Performance was assessed on held-out test partitions and a top-1 linkage selection strategy. In-domain transfer was assessed on additional “dirty” aviation datasets using make-constrained candidate generation, manual audits, and a fully labeled Bureau of Transportation Statistics case to quantify transfer versus source-specific fine-tuning. Results showed that the deep learning approach outperformed the deterministic baseline, remained effective under schema variation, and showed promising transfer behavior. The findings indicate that deep learning approaches provide a practical foundation for aircraft type normalization across heterogeneous sources, enabling cross-source safety analyses that were previously difficult to conduct. Future work should prioritize automated blocking and standardization, broader labeling for additional in-domain sources, and systematic study of negative candidate selection under deployment-like candidate distributions.

URI

https://hdl.handle.net/20.500.11803/5264

Collections

NU Masters and Dissertations (Open Access)

Full item page

Enhancing Aircraft Type Matching Across Data Sources Using Entity Resolution

Files

Authors

Issue Date

Type

Language

Keywords

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

Description

Citation

Publisher

License

Journal

Volume

Issue

URI

PubMed ID

DOI

ISSN

EISSN

Collections