Enhancing Aircraft Type Matching Across Data Sources Using Entity Resolution

Loading...
Thumbnail Image

Authors

Bryan, Karna

Issue Date

2026-03

Type

Dissertation

Language

en

Keywords

deep learning , aviation safety , entity resolution , Business, Engineering, Science, & Technological Innovation

Research Projects

Organizational Units

Journal Issue

Alternative Title

Abstract

This study addresses the challenge of merging aviation safety data from diverse sources that lack shared identifiers, which limits the ability to combine safety events and utilization information for risk assessment. Inconsistent aircraft type representations make it difficult to align records across systems at scale. The purpose of this constructive research study was to develop and evaluate an aircraft type entity matching capability that supports normalization to a common aircraft type reference representation. A manually curated gold dataset was constructed by mapping aircraft type taxonomies from the Federal Aviation Administration and the International Civil Aviation Organization. Using this labeled dataset, a deterministic baseline matcher was implemented using feature similarity scoring and thresholding. Then, a deep learning matcher was trained using a transformer-based pairwise classification approach that serializes record fields into a textual representation. Experiments evaluated negative sampling strategies, schema variation across sources, and multi-source union training. Performance was assessed on held-out test partitions and a top-1 linkage selection strategy. In-domain transfer was assessed on additional “dirty” aviation datasets using make-constrained candidate generation, manual audits, and a fully labeled Bureau of Transportation Statistics case to quantify transfer versus source-specific fine-tuning. Results showed that the deep learning approach outperformed the deterministic baseline, remained effective under schema variation, and showed promising transfer behavior. The findings indicate that deep learning approaches provide a practical foundation for aircraft type normalization across heterogeneous sources, enabling cross-source safety analyses that were previously difficult to conduct. Future work should prioritize automated blocking and standardization, broader labeling for additional in-domain sources, and systematic study of negative candidate selection under deployment-like candidate distributions.

Description

Citation

Publisher

License

Journal

Volume

Issue

PubMed ID

DOI

ISSN

EISSN