Redirigiendo al acceso original de articulo en 21 segundos...
Inicio  /  Future Internet  /  Vol: 15 Par: 9 (2023)  /  Artículo
ARTÍCULO
TITULO

On Evaluating IoT Data Trust via Machine Learning

Timothy Tadj    
Reza Arablouei and Volkan Dedeoglu    

Resumen

Data trust in IoT is crucial for safeguarding privacy, security, reliable decision-making, user acceptance, and complying with regulations. Various approaches based on supervised or unsupervised machine learning (ML) have recently been proposed for evaluating IoT data trust. However, assessing their real-world efficacy is hard mainly due to the lack of related publicly available datasets that can be used for benchmarking. Since obtaining such datasets is challenging, we propose a data synthesis method, called random walk infilling (RWI), to augment IoT time-series datasets by synthesizing untrustworthy data from existing trustworthy data. Thus, RWI enables us to create labeled datasets that can be used to develop and validate ML models for IoT data trust evaluation. We also extract new features from IoT time-series sensor data that effectively capture its autocorrelation as well as its cross-correlation with the data of the neighboring (peer) sensors. These features can be used to learn ML models for recognizing the trustworthiness of IoT sensor data. Equipped with our synthesized ground-truth-labeled datasets and informative correlation-based features, we conduct extensive experiments to critically examine various approaches to evaluating IoT data trust via ML. The results reveal that commonly used ML-based approaches to IoT data trust evaluation, which rely on unsupervised cluster analysis to assign trust labels to unlabeled data, perform poorly. This poor performance is due to the underlying assumption that clustering provides reliable labels for data trust, which is found to be untenable. The results also indicate that ML models, when trained on datasets augmented via RWI and using the proposed features, generalize well to unseen data and surpass existing related approaches. Moreover, we observe that a semi-supervised ML approach that requires only about 10% of the data labeled offers competitive performance while being practically more appealing compared to the fully supervised approaches. The related Python code and data are available online.

 Artículos similares

       
 
Giovanni Tardioli, Ricardo Filho, Pierre Bernaud and Dimitrios Ntimos    
The estimation of indoor thermal comfort and the associated occupant feedback in office buildings is important to provide satisfactory and safe working environments, enhance the productivity of personnel, and to reduce complaints. The assessment of therm... ver más
Revista: Buildings

 
Jianhua Liu and Zibo Wu    
The cloud-based Internet of Things (IoT-Cloud) combines the advantages of the IoT and cloud computing, which not only expands the scope of cloud computing but also enhances the data processing capability of the IoT. Users always seek affordable and effic... ver más
Revista: Future Internet

 
Kostas Kolomvatsos and Christos Anagnostopoulos    
Pervasive computing applications deal with the intelligence surrounding users that can facilitate their activities. This intelligence is provided in the form of software components incorporated in embedded systems or devices in close distance with end us... ver más
Revista: IoT

 
Mahdi H. Miraz, Maaruf Ali, Peter S. Excell and Richard Picking    
The current statuses and future promises of the Internet of Things (IoT), Internet of Everything (IoE) and Internet of Nano-Things (IoNT) are extensively reviewed and a summarized survey is presented. The analysis clearly distinguishes between IoT and Io... ver más
Revista: Future Internet