Enhancing Aircraft Type Matching Across Data Sources Using Entity Resolution

dc.contributor.authorBryan, Karna
dc.date.accessioned2026-04-15T01:04:28Z
dc.date.available2026-04-15T01:04:28Z
dc.date.issued2026-03
dc.description.abstractThis study addresses the challenge of merging aviation safety data from diverse sources that lack shared identifiers, which limits the ability to combine safety events and utilization information for risk assessment. Inconsistent aircraft type representations make it difficult to align records across systems at scale. The purpose of this constructive research study was to develop and evaluate an aircraft type entity matching capability that supports normalization to a common aircraft type reference representation. A manually curated gold dataset was constructed by mapping aircraft type taxonomies from the Federal Aviation Administration and the International Civil Aviation Organization. Using this labeled dataset, a deterministic baseline matcher was implemented using feature similarity scoring and thresholding. Then, a deep learning matcher was trained using a transformer-based pairwise classification approach that serializes record fields into a textual representation. Experiments evaluated negative sampling strategies, schema variation across sources, and multi-source union training. Performance was assessed on held-out test partitions and a top-1 linkage selection strategy. In-domain transfer was assessed on additional “dirty” aviation datasets using make-constrained candidate generation, manual audits, and a fully labeled Bureau of Transportation Statistics case to quantify transfer versus source-specific fine-tuning. Results showed that the deep learning approach outperformed the deterministic baseline, remained effective under schema variation, and showed promising transfer behavior. The findings indicate that deep learning approaches provide a practical foundation for aircraft type normalization across heterogeneous sources, enabling cross-source safety analyses that were previously difficult to conduct. Future work should prioritize automated blocking and standardization, broader labeling for additional in-domain sources, and systematic study of negative candidate selection under deployment-like candidate distributions.
dc.identifier.urihttps://hdl.handle.net/20.500.11803/5264
dc.language.isoen
dc.publisher.institutionNational University (NU)
dc.subjectdeep learning
dc.subjectaviation safety
dc.subjectentity resolution
dc.subjectBusiness, Engineering, Science, & Technological Innovation
dc.titleEnhancing Aircraft Type Matching Across Data Sources Using Entity Resolution
dc.typeDissertation
thesis.degree.disciplineData Science
thesis.degree.grantorNational University (NU)
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Bryan_updated.pdf
Size:
2.31 MB
Format:
Adobe Portable Document Format