Are We Training with The Right Data? Evaluating Collective Confidence in Training Data using Dempster Shafer Theory

Dey, Sangeeta; Lee, Seok Won

DC Field	Value	Language
dc.contributor.author	Dey, Sangeeta	-
dc.contributor.author	Lee, Seok Won	-
dc.date.issued	2022-01-01	-
dc.identifier.uri	https://aurora.ajou.ac.kr/handle/2018.oak/36811	-
dc.identifier.uri	https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85132965104&origin=inward	-
dc.description.abstract	The latest trend of incorporating various data-centric machine learning (ML) models in software-intensive systems has posed new challenges in the quality assurance practice of software engineering, especially in a high-risk environment. ML experts are now focusing on explaining ML models to assure the safe behavior of ML-based systems. However, not enough attention has been paid to explain the inherent uncertainty of the training data. The current practice of ML-based system engineering lacks transparency in the systematic fitness assessment process of the training data before engaging in the rigorous ML model training. We propose a method of assessing the collective confidence in the quality of a training dataset by using Dempster Shafer theory and its modified combination rule (Yager's rule). With the example of training datasets for pedestrian detection of autonomous vehicles, we demonstrate how the proposed approach can be used by the stakeholders with diverse expertise to combine their beliefs in the quality arguments and evidences about the data. Our results open up a scope of future research on data requirements engineering that can facilitate evidence-based data assurance for ML-based safety-critical systems.	-
dc.description.sponsorship	This work was supported by the BK21 FOUR program of the National Research Foundation (NRF) of Korea funded by the Ministry of Education (NRF5199991014091) and the Basic Science Research Program through the NRF funded by the Ministry of Science and ICT (NRF-2020R1F1A1075605).	-
dc.language.iso	eng	-
dc.publisher	IEEE Computer Society	-
dc.subject.mesh	Data centric	-
dc.subject.mesh	Data uncertainty	-
dc.subject.mesh	Dempster-Shafer theory	-
dc.subject.mesh	High risk environment	-
dc.subject.mesh	Machine learning models	-
dc.subject.mesh	Machine-learning	-
dc.subject.mesh	Quality assurance practices	-
dc.subject.mesh	Software intensive systems	-
dc.subject.mesh	Training data	-
dc.subject.mesh	Training dataset	-
dc.title	Are We Training with The Right Data? Evaluating Collective Confidence in Training Data using Dempster Shafer Theory	-
dc.type	Conference	-
dc.citation.conferenceDate	2022.05.22.~2022.05.27.	-
dc.citation.conferenceName	44th ACM/IEEE International Conference on Software Engineering: New Ideas and Emerging Results, ICSE-NIER 2022	-
dc.citation.edition	Proceedings - 2022 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results, ICSE-NIER 2022	-
dc.citation.endPage	15	-
dc.citation.startPage	11	-
dc.citation.title	Proceedings - International Conference on Software Engineering	-
dc.identifier.bibliographicCitation	Proceedings - International Conference on Software Engineering, pp.11-15	-
dc.identifier.doi	10.1109/icse-nier55298.2022.9793521	-
dc.identifier.scopusid	2-s2.0-85132965104	-
dc.subject.keyword	data uncertainty	-
dc.subject.keyword	Dempster Shafer theory	-
dc.subject.keyword	machine learning	-
dc.subject.keyword	safety	-
dc.type.other	Conference Paper	-
dc.identifier.pissn	02705257	-
dc.description.isoa	false	-
dc.subject.subarea	Software	-

Show simple item record

qrcode

트윗하기

Related Researcher

Lee, Seok-Won이석원: Department of Software and Computer Engineering

File Download

There are no files associated with this item.

Related Researcher

Total Views & Downloads

File Download