Next Article in Journal
Thermal and Thermomechanical Characterization of Polypropylene-Seed Shell Particles Composites
Previous Article in Journal
RISE Test Facilities for the Measurement of Ultra-Low Flow Rates and Volumes with a Focus on Medical Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fine-Grained Identification for Large-Scale IoT Devices: A Smart Probe-Scheduling Approach Based on Information Feedback

College of Computer, National University of Defense Technology, Changsha 410073, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(16), 8335; https://doi.org/10.3390/app12168335
Submission received: 18 July 2022 / Revised: 12 August 2022 / Accepted: 19 August 2022 / Published: 20 August 2022

Abstract

:
A large number of IoT devices access the Internet. While enriching our lives, IoT devices bring potential security risks. Device identification is one effective way to mitigate security risks and manage IoT assets. Typical identification algorithms generally separate data capture and target identification into two parts. As a result, it is inefficient and coarse-grained to evaluate the results only once the identification process is complete and then adjust the data capture strategy afterward. To solve this problem, we propose a fine-grained probe-scheduling approach based on information feedback. First, we model the probe surface as three layers for IoT devices and define their relationships. Then, we improve the policy gradient algorithm to optimize the probe policy and generate the optimal probe sequence for the target device. We implement a prototype system and evaluate it on 53,000 IoT devices across various categories to show its wide applicability. The results indicate that our approach can achieve success rates of 96.89%, 93.43%, and 83.71% for device brand, model, and firmware version, respectively, and reduce the identification time by 55.96%.

1. Introduction

With the rapid development of the Internet of Things (IoT), a large number of devices have accessed the Internet. According to Global System for Mobile Communications Association (GSMA) statistics, the number of IoT devices will reach almost 25 billion globally by 2025, up from 10.3 billion in 2018 [1]. While enriching our lives, IoT devices bring potential security risks to cyberspace, such as information leakage, authentication bypass, and lagging firmware upgrades [2]. As one method of cyberspace mapping, device identification can help to mitigate security risks and reduce attack surfaces.
The device fingerprint refers to device features that can be used to identify a device. According to the identification granularity, the device fingerprint includes the device’s brand, type, model, firmware version, and other features. Common cyber search engines, such as Shodan [3] and Censys [4], generally adopt a banner-based approach to identify online IoT devices. While Shodan can reach 95% identification accuracy, the low recall results in coarse identification granularity.
In general, existing identification methods have the following three disadvantages: First, manual identification is time-consuming and insufficiently accurate, making it difficult to establish comprehensive information from the physical layer to the application layer. Second, traversing all the device features increases the communication overhead and may trigger the intrusion detection mechanism [5,6,7]. Third, the typical identification algorithms are accustomed to separating data capture and target identification into two parts. It is inefficient and coarse-grained to evaluate the results until the target identification process is complete and then adjust the data collection strategy afterward.
When identifying large-scale IoT devices, protocol handshake and data transmission (typically milliseconds) consume most of the runtime compared with the identification time (typically microseconds). In the standard reinforcement learning (RL) setting, the agent receives feedback from the environment at every step and chooses an action based on that feedback [8]. Interactions between the agent and the environment allow determining whether the information received is sufficient to complete the identification task and adjust the data collection policy accordingly. In this way, data capture and target identification can be performed simultaneously, avoiding sending additional probe requests. Therefore, a probe-scheduling approach with information feedback can balance the success rate and the communication overhead of device identification.
In this paper, we propose a fine-grained probe-scheduling approach based on information feedback to identify large-scale IoT devices. We aim to improve identification success rates and efficiency for large-scale IoT devices. First, we model the probe surface as three layers for IoT devices and define their relationships. Then, we improve the policy gradient (PG) algorithm to optimize the probe policy and generate the optimal probe sequence for the target device.
We implement a prototype system and evaluate it through real-world experiments to validate our approach. We use the Shodan API [3] to collect response data (open ports, protocol response data, and web feature information) from 53,000 real IoT devices. The dataset covers a wide range of device categories. Thus, our approach has wide applicability, i.e., different types of IoT devices can achieve high identification efficiency and success rates using our approach.
Overall, our contributions are summarized as follows:
  • We model the probe surface as three layers by analyzing the characteristics of IoT devices and define their sequential relationships.
  • We propose a fine-grained probe-scheduling approach based on information feedback to achieve high identification efficiency and success rates. Using the improved RL algorithm, we update the identification state dynamically and select the next action with the greatest benefit.
  • We implement a prototype system and evaluate it on 53,000 IoT devices across various categories. The results show that our approach can achieves success rates of 96.89%, 93.43%, and 83.71% for device brand, model, and firmware version, respectively. Furthermore, our approach reduces the identification time by 55.96% compared with that of the protocol-popularity method.
  • We have released all data and the analysis script to replicate the results of this work and to encourage further studies: https://github.com/sherlocklchen/real-IoT-device-assets.
The remainder of this paper is organized as follows. Section 2 discusses the related work. Section 3 introduces our motivation. Section 4 describes the framework and algorithm for large-scale IoT devices. Section 5 presents the experimental evaluation. We discuss the ability and limitation of our approach in Section 6. Finally, Section 7 concludes.

2. Related Work

In network security, IoT device identification has been used for more than two decades, and there are many related works. On the one hand, device identification can help operators sort out the devices running in the network to find information leaked due to configuration errors. On the other hand, vulnerabilities in IoT devices are usually related to the properties of the device (brand, type, model, etc.) and identifying the device correctly will help operators block known vulnerable devices [9] before they do harm to the network. The main solutions to identify devices can be divided into three categories.
According to the identification granularity, a device fingerprint includes the brand, product model, and firmware version. For example, an IoT device is produced by a brand (e.g., Cisco, Sony), has a product model (e.g., ASR-900 or ASA-5520), and several firmware versions (e.g., 1.04, 3.40). With numerous types of IoT devices, it is difficult to enumerate all fingerprints manually. In prior works [10,11,12,13,14,15,16], traditional, traffic-based, and banner-based approaches have been used to discover and manage IoT devices.

2.1. Traditional Detection Methods

Traditional detection methods focus on identifying the operating system by analyzing TCP/IP protocol characteristics. Nmap [17] sends detection packets to the target device [18] and constructs device fingerprints based on the characteristics of the response data. Zmap [19] is a fast single-packet network scanner that can scan the entire public Internet in less than an hour, displaying information about nearly four billion online devices.
Cheng et al. [20] relied on the hardware differences between the CPU modules of different devices to detect and identify different devices; Park et al. [21] distinguish different devices based on the inherent characteristics of hardware for embedded systems; Sanchez-Rola et al. [22] compute a hardware fingerprinting, based on timing the execution of sequences of instructions readily available in API functions.
The identification success rate for a limited number of operating systems is acceptable, while the rate will drop significantly for the wide variety of IoT devices.

2.2. Traffic-Based Methods

To identify devices, some researchers have collected and analyzed traffic data. Miettinen et al. [10] used machine learning to distinguish the types of smart devices. The fingerprin tS is represented by n data packets and 23 features (such as packet length, port number, and protocol used by the packet) as binary features, which can achieve high accuracy. Wang et al. [12] designed a port scanning strategy that combines multiple weak classifiers into multiple classifiers. Each classifier is responsible for analyzing specific port data, which greatly shortens the cycle of device identification and increases the identification accuracy by 46.67%. Yu et al. [11] used Convolutional Neural Networks (CNN) and Long Short-Term Memory Networks (LSTM) to extract and construct the characteristic fingerprints of HTTP and TCP cross-layer data packets to achieve high-precision and fine-grained IoT device identification. In [13,23,24,25,26,27,28,29,30], an inspection of data packets was used to extract device features.
The authors of [10,31,32] proposed mechanisms for analyzing encrypted traffic. The mechanism proposed in IoT Sentinel [10] uses a flow attribute vector of 276 dimensions (12 groups × 23 features), which increases the excessively high computational cost [33]. The mechanism proposed in [31] requires 49 traffic attributes and 30,000 frames to identify the device, and it takes a long time to capture the traffic.
Although the automation of device identification is improved when applying machine learning to analyze network traffic, the device model can not be identified (coarse granularity).

2.3. Banner-Based Methods

The term banner refers to the device attributes contained in the protocol packets, which typically include the device type, brand, and model [16,32,34,35]. Obtaining protocol banners require first sending probe packets for a specific services (i.e., ports) to the target device. If the target device runs the particular service, it will return the response packets containing device information. Since IoT devices run a large number of services, we can obtain a wide variety of banners to improve the accuracy of device identification.
DAN et al. [11] proposed a cross-layer protocol fingerprinting technique for fine-grained device identification. This approach utilized a convolutional neural network (CNN) and a long short-term memory network (LSTM) to extract and construct feature fingerprints. Nevertheless, the proactive identification method increases the identification time because it depends on the network state.
Qiang et al. [15] proposed an approach for generating fine-grained fingerprints based on the subtle differences between the file systems of various firmware images. They leveraged natural language processing to process the file content and the document object model to obtain firmware fingerprints. The recall and precision of the firmware fingerprints exceeded 90%. However, this approach requires an average of 75 HTTP packets to identify the firmware version of a single device, which is inefficient for large-scale device identification.
Xuan et al. [36] proposed a scalable framework for physical device profiling that leverages banner grabbing to identify device types and running services before using clock skew to determine a device ID. Although they used multiple protocols to improve the identification accuracy, the approach only ranks the popularity of application layer protocols to identify device types, which increases the communication costs. By contrast, our method balances the success rate with the communication overhead by scheduling multiple probe methods.
However, there are two shortcomings of banner-based device identification: (1) The target device may not return the corresponding protocol banner after sending a protocol probe packet to the target; (2) The device type can be easily extracted from the banners, but most banners contain incomplete information about device properties, such as brand and model. Combining the various protocol banners supported by the device for device identification will increase the communication and time overhead considerably.

3. Motivation

By observing and analyzing fingerprint features, we model the probe surface for IoT devices as three layers and define their relationships.

3.1. Port Layer

The port layer refers to the opening ports, which are associated with the host’s communication protocol and specific service types. Unlike traditional hosts, most IoT devices are made for specific tasks, which can reflect the device fingerprint. In Appendix A, we shows the default open ports for the main IoT device manufacturers. For example, webcams need to receive control data and send image data via the network. Therefore, they use the real-time streaming protocol (RTSP) and Onvif protocols.
Table 1 shows three typical categories of devices: webcam, network printer, and network-attached storage (NAS). Several unique ports are used by only one type of device. For example, the Dahua uses port 37777 to run its private protocol. Thus, we can perform coarse-grained identification for IoT devices based on the extracted port features at the first step.

3.2. Protocol-Response Layer

The protocol-response layer contains response data such as HTTP and SSH protocols. Obtaining protocol responses requires sending probe packets to the target device. If the target device opens the particular service, it will return a response packet containing a device fingerprint. For example, as shown in Figure 1, a Cisco device opens the 80_HTTP and 23_Telnet ports. The Telnet protocol response contains information about the device model and firmware version, while the information about the device brand appears in the HTTP protocol response. Therefore, both 80_HTTP and 23_Telnet responses should be captured to obtain the complete device fingerprint.

3.3. Web-Feature Layer

Web features refer to the features of web applications (e.g., special URLs, SSL certificates). IoT devices typically use the Linux-based file system that contains tens of thousands of files. The “WWW” directory contains files that can be accessed through the web. These files can be used to identify the model and firmware version. As shown in Figure 2, the feature URLs and SSL certificate provide information about the device brand and model.

3.4. Dependencies among the Probe Surfaces

There are hierarchical dependencies among the three-layer probe surface. Individual ports or port combinations can be used to identify the device brand but not the model or firmware version. The protocol-response layer must determine whether the port is open before probing the protocol response. Protocol responses can be used to identify the model and firmware version. The web-feature layer must determine whether the target device opens the HTTP/HTTPS protocol port (such as 80 and 443) and runs a web service. Then, web features such as special URLs, page content, and SSL certificates can be further gathered.
Device identification is a dynamic process. We can improve the RL algorithm to update the identification status and obtain the optimal probe sequence. Furthermore, interactions between the agent and the environment enable evaluation of whether the information received is sufficient to complete the identification task and adjust the capture policy accordingly. In this way, data capture and device identification can be performed simultaneously, preventing the need to send additional probe requests.

4. Framework

In this section, we present the framework and algorithm of fine-grained device identification. The framework has three main modules (see Figure 3): probing knowledge base module, data analysis module, and probe-scheduling module.

4.1. Probing Knowledge Base Module

The probing knowledge base consists of several probe plugins and a multidimensional fingerprint library. We use plugins to probe open ports, different protocol responses, and web features (e.g., special URLs, SSL certificates). We construct the fingerprint library automatically and update it regularly based on the structural features of device fingerprints displayed on websites.
The probing knowledge base module supports both the data analysis and the probe-scheduling modules.

4.2. Data Analysis Module

The data analysis module analyzes the response data from the three layers. We show the default open ports for the main IoT device manufacturers in Appendix A. The port-scan algorithm is detailed in Algorithm 1. When scanning open ports, we detect the characteristic ports first. If open, the device brand can be determined directly (e.g., 37777_Dahua, 2020_TP-Link). Then, we detect the characteristic port combinations. If open, the device brand can be determined (e.g., 81,82_Hikvision, 81,21_Axis). Finally, we detect the typical protocols. If the target device runs the typical protocols, the device type can be determined (e.g., RTSP_webcam, IPP_printer).
Algorithm 1: Port Scan Algorithm.
Input: O b j : IoT device to be identified;
Variables: P c : Characteristic ports for O b j ;
                  P c o m b i n e : Characteristic ports combinations for O b j ;
                  P t p : typical protocols for O b j ;
Output: c : Device category;
1 
Initialise c = N u l l ;
2 
ifIsopen( O b j , P c )then
3 
|   c = Devicebrand( P c )
4 
else ifIsopen( O b j , P c o m b i n e )then
5 
|   c = Devicebrand( P c o m b i n e )
6 
else ifIsopen( O b j , P t p )then
7 
|   c = Devicetype( P t p )
8 
end
9 
( returnc
Figure 4 illustrates how the data analysis module processes the response data from the target device, including protocol responses, and web-feature data. First, the detector sends probe messages actively to target devices to obtain different types of responses. Then, we process the response data and generate segmentation lists during the data processing stage. Finally, we obtain the device fingerprint from the segmentation lists using regular expression and keywords matching.
We can flexibly extend and customize the data processing module according to the application scenario. For example, when deployed on edge devices with scarce computational resources, some infrequent probe requests can be removed based on the possible device types in the current environment.

4.3. Probe-Scheduling Module

While identifying device fingerprints, we need to send N probe requests one by one to obtain the device information. The optimal probe sequence for IoT devices can obtain device responses containing more fingerprint information while sending as few probe requests as possible. Different probe methods have different benefits for device identification, and the order in which probe requests are sent can affect the identification benefits dynamically.
We model the scheduling problem as a Markov decision process denoted as S , A , r , γ , T [8]. The goal is to maximize the expected discounted return:
J = E τ Σ T 1 t = 0 γ t r t
where τ is the trajectory ( s 0 , a 0 , r 0 , s 1 , , s T 1 , a T 1 , r T 1 ) and r t = r s t , a t . The core idea behind the PG algorithm is to obtain the policy gradient θ J of the expected discounted return with respect to the policy parameter θ .
g p o l i c y = E τ θ log π θ a τ s τ G τ = E τ θ Σ T 1 t = 0 log π θ a t s t G t
where G t = Σ k = 0 γ k r t + k denotes the discounted return following time t.
The scheduling algorithm is detailed in Algorithm 2. The state set S refers to the identification state of the device, including the brand, model, and firmware version. The executable action set A includes probe actions such as probing SNMP responses and probing special URLs.
Algorithm 2: Scheduling Algorithm.
Applsci 12 08335 i001
The symbol r indicates the immediate reward provided by the change in identification status after sending the probe request. According to the identification granularity and the model convergence in the experiment, we set the reward function r. If the device identification state remains unchanged, r = 5 ; if the brand information of the device is added, r = 10 ; if the model information is added, r = 50 ; if the firmware version information is added, r = 100 .
The agent’s task is to learn a strategy π : s a to choose the next action a t based on the current state s t , i.e., π s t = a t . At each discrete time t, the agent perceives the current state s t and chooses the current action a t according to s t . After obtaining the reward r t = R s t , a t , the agent generates the subsequent state s t + 1 = δ s t , a t , which is related to only the current state. Moreover, the optional probe actions differ among devices due to the different open ports. Therefore, we use the MASK [37] to obtain the available action set A v a l i d from the action set A for valid policy gradient updates.

5. Implementation and Evaluation

We implemented a prototype system and conducted real-world experiments to validate the identification capability. We collected response data (port open, protocol response data, and web feature information) from 53,000 real IoT devices using the Shodan API [3]. As shown in Table 2, the dataset captures common IoT device brands well, covering a wide range of device categories. The ratio of training data to test data in our experiments was 9:1.
As mentioned earlier, cyber search engines can identify device type with an accuracy of over 95%. Our approach focuses on fine-grained identification, such as device brand, model, and firmware version. We utilize the dataset to implement model training and to evaluate the success rate and efficiency of identification. Then, we validate the identification capability of our approach by comparing it with another method based on protocol popularity.

5.1. Data Set Analysis

We first calculate the information acquisition rate for different probe methods. The probability equals n i / N , i ϵ 1 , 12 , where n i is the number of devices successfully identified by the probe methods, and N is the total number of devices. Figure 5 shows the probability of obtaining device fingerprints using different probe methods. We found the following patterns in identifying devices:
  • A single method cannot identify complete fingerprints. For example, probing using the HTTP/HTTPS protocol has a high probability of obtaining brand and model information but a lower probability of obtaining firmware version.
  • Different response data contain different device fingerprint information. If a protocol response does not contain the firmware version, it is more likely that the rest of the protocol responses will contain this information.
  • Different combinations of probe methods have different complementarity in identifying device fingerprints. For example, the probability of identifying firmware version via the SNMP, HTTPS, and SIP protocols is 73.02%, 15.90%, and 50.82%, respectively. Although the probability of identifying the firmware version is only 15.90% using HTTPS, it can reach 80.31% when combined with SNMP, which is higher than the probability of 78.17% achieved by combining the SNMP and SIP protocols.
In other words, the complementarity between the SNMP and HTTPS protocols is higher than that between the SNMP and SIP protocols because the HTTPS and SNMP protocols’ responses have less duplicate fingerprint information. Therefore, the communication overhead can be significantly reduced by optimizing the probe methods and their order.

5.2. Evaluation

For 53,000 devices, we examine the identification capability via 10-fold cross-validation. Figure 6 shows how the success rate changes with the maximum number of probe methods. The X-axis represents the maximum number of probe methods, and the Y-axis represents the success rate. The success rate equals N g e t / N a l l , where N a l l is the number of target devices, and N g e t is the number of devices that return responses containing device fingerprint information.
Our approach selects the optimal probe method in the first step of identification and achieves a high success rate. Furthermore, the success rate stabilizes when the maximum number of probe methods reaches five. At this stage, the identification success rates for device brand, model, and firmware version reach 96.89%, 93.43%, and 83.71%, respectively. Therefore, when performing large-scale device identification, we can ignore the probe methods in the tail of the optimal probe sequence to improve efficiency.
Different types of IoT devices can use our approach to obtain high efficiency and success rates, i.e., our approach has wide applicability.

5.3. Success Rate and Time Efficiency Compared with Other Work

5.3.1. Success Rate Performance

Table 3 compares our approach with another approach based on protocol popularity. The results show that we can identify device fingerprints at a finer granularity (device model and firmware version) than [36]. In addition, we use no more than five probe methods, which reduces the communication overhead significantly.

5.3.2. Time Efficiency Performance

To validate that our approach can successfully balance the success rate with communication overhead, we compute the identification time of 5292 real IoT devices. We compare our approach with another approach based on protocol popularity. The idea of ranking protocol popularity is inspired by previous work [36]. We rank the probe methods as HTTP, HTTPS, RTSP, FTP, SSH, TELNET, SNMP, CWMP, and PPTP based on the number of responses.
Figure 7 shows how the identification success rate changes with the maximum number of probe methods for the two approaches. Figure 7a–c show how the success rate changes when identifying brand, model, and firmware version, respectively. Our scheduling policy achieves a higher identification success rate with fewer probe methods. Especially for firmware version identification, our approach reaches a success rate of 83.71% with a combination of no more than five probes. By contrast, the approach based on protocol popularity requires a combination of nine probes to achieve a similar success rate.
In addition, we calculate the time to identify 5292 real IoT devices. Our approach requires 10.89 min to achieve success rates of 96.89%, 93.43%, and 83.71% for the brand, model, and firmware version, respectively, while the popularity-based approach requires 24.73 min. Our approach can reduce the identification time by 55.96% when identifying large-scale IoT devices.

6. Discussion and Limitations

In this section, we discuss the ability and limitation of our approach, and explore the improvement direction in the future.

6.1. Ranks of Probing Actions

Our evaluation shows that the reward function r in the Algorithm 2 is appropriate. In fact, we can extend the reward function to describe the benefits of the probing action more accurately. For example, different probing actions consume different communication time. In this case, we can consider the action execution time, success rate and other factors as the reward function parameters.

6.2. Scheduling Policy

There are three basic ways to implement an RL algorithm: value-based, policy-based and model-based. Our evaluation shows that the policy-based algorithm is used in reasonable way. In fact, we can use other RL algorithms such as Deep Reinforcement Learning (DQN) to test our approach and compare the effects of different algorithms in experiments.

7. Conclusions

In this paper, we propose a fine-grained probe-scheduling approach based on information feedback to identify large-scale IoT devices. First, we model the probe surface as three layers for IoT devices and define their relationships. Then, we improve the policy gradient algorithm to optimize the probe policy and generate the optimal probe sequence for the target device. We implement a prototype system and evaluate its effectiveness through real-world experiments. Our approach can achieve success rates of 96.89%, 93.43%, and 83.71% for device brand, model, and firmware version, respectively, and it reduces the identification time by 55.96%.

Author Contributions

Conceptualization, C.L. and B.Y.; methodology, C.L.; software, W.X.; validation, C.L. and B.W.; data curation, C.L.; writing—original draft preparation, W.P.; writing—review and editing, C.L., B.Y., W.X., B.W. and W.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the the Natural Science Foundation of China (61902416, 61902412) and Natural Science Foundation of Hunan Province in China (2019JJ50729).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

We will release all data and the analysis script to replicate the results of this work and to encourage further studies: https://github.com/sherlocklchen/real-IoT-device-assets.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
IoTInternet of Things
RLReinforcement Learning
PGpolicy gradient
GSMAGlobal System for Mobile Communications Association
SNMPSimple Network Management Protocol
HTTPHyper Text Transfer Protocol
HTTPSHyper Text Transfer Protocol over SecureSocket Layer
RTSPReal Time Streaming Protocol
FTPFile Transfer Protocol
SSHSecure Shell Protocol
TELNETTelecommunication Network Protocol
CWMPCPE WAN Management Protocol
PPTPPoint to Point Tunneling Protocol
IPPInternet Printing Protocol
UPnPUniversal Plug and Play Protocol
OnvifOpen Network Video Interface Forum
NDMPNetwork Data Management Protocol
NASNetwork Attached Storage
URLUniform Resource Location
SSLSecure Sockets Layer
CNNConvolutional Neural Networks
LSTMLong-Term Memory Networks
APIApplication Programming Interface
DQNDeep Reinforcement Learning

Appendix A. Special Port for IoT Devices

Table A1. Special Port For Webcam.
Table A1. Special Port For Webcam.
Device BrandProtocol TypePorts
HIKVISIONHTTP/HTTPS81, 80, 82, 443, 8443
RTSP554
DateService8000
Onvif80
DahuaHTTP/HTTPS80, 8080, 443, 8443
RTSP554
DateService37777
Onvif80
TP-LinkHTTP/HTTPS80, 443, 8080, 443
RTSP554
Onvif2020, 80
D-LinkHTTP/HTTPS80, 443, 8080
RTSP554
Onvif80
VIVOTEKHTTP/HTTPS80, 443, 8080
RTSP554
Onvif80
FTP21
AXISHTTP/HTTPS80, 81, 8081, 8080
RTSP554
Onvif80
FTP21
PanasonicHTTP/HTTPS80, 443, 81
RTSP554
Onvif80
FTP21
CiscoHTTP/HTTPS80, 443
RTSP554
FTP21
Table A2. Special Port For Firewall.
Table A2. Special Port For Firewall.
Device BrandProtocol TypePorts
Cisco ASAHTTP/HTTPS443, 80, 8443
SSH22
Telnet23
Fortinet GateHTTP/HTTPS10443, 443, 80, 8443
SSH22
Telnet23
HuaweiHTTP/HTTPS443, 8443, 80, 8888
SSH22
Telnet23
SNMP161
D-Link DFLHTTP/HTTPS80, 443, 8080, 8443
Telnet23
RuijieHTTP/HTTPS443, 80
Telnet23
SNMP161
SSH22
Table A3. Special Port For Router.
Table A3. Special Port For Router.
Device BrandProtocol TypePorts
CiscoHTTP/HTTPS80, 8080, 8081, 443
SNMP161
UPnP1900
FTP21
SSH22
Telnet23
NetcoreHTTP/HTTPS8080, 8081, 443
SNMP161
UPnP1900
FTP21
SSH22
Telnet23
JuniperHTTP/HTTPS80, 443
SNMP161
UPnP1900
FTP21
SSH22
Telnet23
Table A4. Special Port For Printer.
Table A4. Special Port For Printer.
Device BrandProtocol TypePorts
SamsungHTTP/HTTPS80, 8080, 8081, 443
IPP631
FTP21
Telnet23
SNMP161
PJL9100, 9101, 9102
LexMarkHTTP/HTTPS80, 8000, 8080, 443
IPP631
FTP21, 9600
Telnet23
SNMP161
PJL9100, 515
Finger79
DellHTTP/HTTPS80, 443
IPP631
FTP21
Telnet9000, 23
SNMP161
PJL9100, 9101, 9102

References

  1. GSM Association. IoT Connections Forecast: The Rise of Enterprise. Dosegljivo. Available online: https://www.gsma.com/iot/resources/iot-connections-forecast-the-riseof-enterprise/ (accessed on 15 November 2020).
  2. Park, M.; Oh, H.; Lee, K. Security risk measurement for information leakage in IoT-based smart homes from a situational awareness perspective. Sensors 2019, 19, 2148. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Matherly, J. Complete Guide to Shodan; Shodan LLC: Pflugerville, TX, USA, 2015; Volume 1. [Google Scholar]
  4. Ribeiro, T.; Vala, M.; Paiva, A. Censys: A model for distributed embodied cognition. In Lecture Notes in Computer Science, Proceedings of the International Workshop on Intelligent Virtual Agents, Edinburgh, UK, 29–31 August 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 58–67. [Google Scholar]
  5. Feng, X.; Li, Q.; Wang, H.; Sun, L. Characterizing industrial control system devices on the Internet. In Proceedings of the International Conference on Network Protocols (ICNP), Singapore, 8–11 November 2016. [Google Scholar]
  6. Wang, S.; Bi, J.; Wu, J.; Vasilakos, A.V.; Fan, Q. VNE-TD: A virtual network embedding algorithm based on temporal-difference learning. Comput. Netw. 2019, 161, 251–263. [Google Scholar] [CrossRef]
  7. Huang, M.; Liu, A.; Xiong, N.N.; Wang, T.; Vasilakos, A.V. A low-latency communication scheme for mobile wireless sensor control systems. IEEE Trans. Syst. Man Cybern. Syst. 2019, 49, 317–332. [Google Scholar] [CrossRef]
  8. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  9. Cisco. Big Security in a Small Business World 10 Myth Busters for SMB Cybersecurity; Cisco: San Jose, CA, USA, 2020. [Google Scholar]
  10. Miettinen, M.; Marchal, S.; Hafeez, I.; Asokan, N.; Sadeghi, A.R.; Tarkoma, S. Iot sentinel: Automated device-type identification for security enforcement in iot. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA, 5–8 June 2017; pp. 2177–2184. [Google Scholar]
  11. Yu, D.; Xin, H.; Chen, Y.; Ma, Y.; Chen, J. Cross-Layer Protocol Fingerprint for Large-Scale Fine-Grain Devices Identification. IEEE Access 2020, 8, 176294–176303. [Google Scholar] [CrossRef]
  12. Wang, X.; Huang, J.; Qi, C. FDI: A Fast IoT Device Identification Approach. In Proceedings of the 2020 International Conference on Cyberspace Innovation of Advanced Technologies, Guangzhou, China, 4–6 December 2020; pp. 277–282. [Google Scholar]
  13. Sivanathan, A.; Gharakheili, H.H.; Loi, F.; Radford, A.; Wijenayake, C.; Vishwanath, A.; Sivaraman, V. Classifying IoT devices in smart environments using network traffic characteristics. IEEE Trans. Mob. Comput. 2018, 18, 1745–1759. [Google Scholar] [CrossRef]
  14. Antonakakis, M.; April, T.; Bailey, M.; Bernhard, M.; Bursztein, E.; Cochran, J.; Durumeric, Z.; Halderman, J.A.; Invernizzi, L.; Kallitsis, M.; et al. Understanding the mirai botnet. In Proceedings of the 26th USENIX security symposium (USENIX Security 17), Vancouver, BC, Canada, 16–18 August 2017; pp. 1093–1110. [Google Scholar]
  15. Li, Q.; Feng, X.; Wang, R.; Li, Z.; Sun, L. Towards fine-grained fingerprinting of firmware in online embedded devices. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, HI, USA, 16–19 April 2018; pp. 2537–2545. [Google Scholar]
  16. Feng, X.; Li, Q.; Wang, H.; Sun, L. Acquisitional rule-based engine for discovering Internet-of-Thing devices. In Proceedings of the 27th USENIX Security Symposium, Baltimore, MD, USA, 15–17 August 2018; pp. 327–341. [Google Scholar]
  17. Duarte, F.S.L.G.; Sikansi, F.; Fatore, F.M.; Fadel, S.G.; Paulovich, F.V. Nmap: A novel neighborhood preservation space-filling algorithm. IEEE Trans. Vis. Comput. Graph. 2014, 20, 2063–2071. [Google Scholar] [CrossRef] [PubMed]
  18. Yang, K.; Li, Q.; Sun, L. Towards automatic fingerprinting of IoT devices in the cyberspace. Comput. Netw. 2019, 148, 318–327. [Google Scholar] [CrossRef]
  19. Durumeric, Z.; Wustrow, E.; Halderman, J.A. ZMap: Fast Internet-wide Scanning and Its Security Applications. In Proceedings of the 22nd USENIX Security Symposium (USENIX Security 13), Washington, DC, USA, 14–16 August 2013; pp. 605–620. [Google Scholar]
  20. Cheng, Y.; Ji, X.; Zhang, J.; Xu, W.; Chen, Y.C. DemicPU: Device fingerprinting with magnetic signals radiated by CPU. In Proceedings of the ACM Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 1149–1162. [Google Scholar]
  21. Park, S.Y.; Lim, S.; Jeong, D.; Lee, J.; Yang, J.S.; Lee, H. PUFSec: Device fingerprint-based security architecture for Internet of Things. In Proceedings of the IEEE INFOCOM, Atlanta, GA, USA, 1–4 May 2017. [Google Scholar]
  22. Sanchez-Rola, I.; Santos, I.; Balzarotti, D. Clock around the clock: Time-based device fingerprinting. In Proceedings of the ACM Conference on Computer and Communications Security, Toronto, ON, Canada, 15–19 October 2018; pp. 1502–1514. [Google Scholar]
  23. Meidan, Y.; Bohadana, M.; Shabtai, A.; Guarnizo, J.D.; Ochoa, M.; Tippenhauer, N.O.; Elovici, Y. ProfilIoT: A machine learning approach for IoT device identification based on network traffic analysis. In Proceedings of the ACM Symposium on Applied Computing, Pisa, Italy, 21–24 March 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 506–509. [Google Scholar]
  24. Meidan, Y.; Bohadana, M.; Shabtai, A.; Ochoa, M.; Tippenhauer, N.O.; Guarnizo, J.D.; Elovici, Y. Detection of unauthorized IoT devices using machine learning techniques. arXiv 2017, arXiv:1709.04647. [Google Scholar]
  25. Sivanathan, A.; Sherratt, D.; Gharakheili, H.H.; Radford, A.; Wijenayake, C.; Vishwanath, A.; Sivaraman, V. Characterizing and classifying IoT traffic in smart cities and campuses. In Proceedings of the 2017 IEEE Conference on Computer Communications Workshops, INFOCOM WKSHPS, Atlanta, GA, USA, 1–4 May 2017; pp. 559–564. [Google Scholar]
  26. Santos, M.R.; Andrade, R.M.; Gomes, D.G.; Callado, A.C. An efficient approach for device identification and traffic classification in IoT ecosystems. In Proceedings of the IEEE Symposium on Computers and Communications, Natal, Brazil, 25–28 June 2018; pp. 304–309. [Google Scholar]
  27. Fki, Z.; Ammar, B.; Ayed, M.B. Machine learning with Internet of Things data for risk prediction: Application in ESRD. In Proceedings of the International Conference on Research Challenges in Information Science, Barcelona, Spain, 17–20 May 2018; pp. 1–6. [Google Scholar]
  28. Shen, Y.Z.; Gu, C.X.; Chen, X.; Zhang, X.L.; Lu, Z.Y. Vulnerability analysis of OpenVPN system based on model learning. Ruan Jian Xue Bao/J. Softw. 2019, 30, 3750–3764. [Google Scholar]
  29. Shaikh, F.; Bou-Harb, E.; Crichigno, J.; Ghani, N. A Machine Learning Model for Classifying Unsolicited IoT Devices by Observing Network Telescopes. In Proceedings of the 2018 14th International Wireless Communications and Mobile Computing Conference, IWCMC 2018, Limassol, Cyprus, 25–29 June 2018; pp. 938–943. [Google Scholar]
  30. Thangavelu, V.; Divakaran, D.M.; Sairam, R.; Bhunia, S.S.; Gurusamy, M. DEFT: A Distributed IoT Fingerprinting Technique. IEEE Internet Things J. 2019, 6, 940–952. [Google Scholar] [CrossRef]
  31. Maiti, R.R.; Siby, S.; Sridharan, R.; Tippenhauer, N.O. Link-layer device type classification on encrypted wireless traffic with COTS radios. In Lecture Notes in Computer Science, Proceedings of the European Symposium on Research in Computer Security, Oslo, Norway, 11–15 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10493, pp. 247–264. [Google Scholar]
  32. Apthorpe, N.; Reisman, D.; Sundaresan, S.; Narayanan, A.; Feamster, N. Spying on the smart home: Privacy attacks and defenses on encrypted iot traffic. arXiv 2017, arXiv:1708.05044. [Google Scholar]
  33. Clarke, M.R.B.; Duda, R.O.; Hart, P.E. Pattern Classification and Scene Analysis. J. R. Stat. Soc. Ser. A Gen. 1974, 137, 442. [Google Scholar] [CrossRef]
  34. Zhu, F.; Liu, L.; Meng, W.; Lv, T.; Hu, S.; Ye, R. SCAFFISD: A scalable framework for fine-grained identification and security detection of wireless routers. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December–1 January 2021; pp. 1194–1199. [Google Scholar]
  35. Samtani, S.; Yu, S.; Zhu, H.; Patton, M.; Matherly, J.; Chen, H. Identifying SCADA systems and their vulnerabilities on the internet of things: A text-mining approach. IEEE Intell. Syst. 2018, 33, 63–73. [Google Scholar] [CrossRef]
  36. Feng, X.; Li, Q.; Han, Q.; Zhu, H.; Liu, Y.; Cui, J.; Sun, L. Active profiling of physical devices at internet scale. In Proceedings of the 2016 25th International Conference on Computer Communications and Networks, ICCCN 2016, Waikoloa, HI, USA, 1–4 August 2016. [Google Scholar]
  37. Huang, S.; Ontañón, S. A closer look at invalid action masking in policy gradient algorithms. arXiv 2020, arXiv:2006.14171. [Google Scholar] [CrossRef]
Figure 1. Examples of protocol responses.
Figure 1. Examples of protocol responses.
Applsci 12 08335 g001
Figure 2. Examples of web features.
Figure 2. Examples of web features.
Applsci 12 08335 g002
Figure 3. Overview of our framework.
Figure 3. Overview of our framework.
Applsci 12 08335 g003
Figure 4. The process of data analysis.
Figure 4. The process of data analysis.
Applsci 12 08335 g004
Figure 5. The probability of obtaining device fingerprints.
Figure 5. The probability of obtaining device fingerprints.
Applsci 12 08335 g005
Figure 6. The success rate of device fingerprint identification.
Figure 6. The success rate of device fingerprint identification.
Applsci 12 08335 g006
Figure 7. Time efficiency compared with another work: (a) success rate of brand; (b) success rate of model; (c) success rate of version.
Figure 7. Time efficiency compared with another work: (a) success rate of brand; (b) success rate of model; (c) success rate of version.
Applsci 12 08335 g007
Table 1. Typical open port combinations.
Table 1. Typical open port combinations.
Device TypeProtocol TypeDefault Ports
WebcamHTTP/HTTPS81, 80, 82, 8080, 443, 8443
RTSP554
Data Service8000, 37777
Onvif80, 2020, 3702
PrinterHTTP/HTTPS80, 8000, 8080, 8081, 443, 8443
IPP631
FTP21, 9600
Telnet23
PJL9100, 9101, 9102, 515
FirewallHTTP/HTTPS443, 10443, 8443, 80, 8888, 8080
SSH22
Telnet23
SNMP161
NASHTTP/HTTPS80, 8080, 443, 8443, 5000, 5001, 8000
UPnP1900
FTP21
NDMP10,000
Rpcbind111
Table 2. Sample device models for the real-world test.
Table 2. Sample device models for the real-world test.
BrandsNumber of ModelsNumber of DevicesMain Model SeriesType
Cisco94615,186CSR, DPQ, ASR, RVRouter, Switch
TANDBERG, CodianWebcam
ASAFirewall
Huawei (H3C)776 (114)20,977AR, HG, EG, Quidway, WX, CR, SIRouter, Switch
Secoway, Eudemon, ASG, SecPathFirewall
IPC, HiSiliconWebcam
D_Link66612,270DI, DCM, DSL, DIRRouter, Switch
DCS, DSHWebcam
Juniper4831333MX, ERS, EXRouter, Switch
SRXFirewall
HP370850ProCurve, SuperStack, NRRouter, Switch
LaserJet, OfficeJetPrinter
Synology191718DiskStation, CubeStationNAS
Dahua1171219DHI, HCVR, NVRWebcam
Fortinet86553FortiGate, FortiManagerFirewall
Table 3. Success rate compared with other work.
Table 3. Success rate compared with other work.
WorkApproachFeathersSuccess Rate of Identification
TypeBrandModelVersion
[36]Protocol Banner FingerprintsProtocol Bannersover 90%NANANA
Our WorkProbe SchedulingMulti-layer features93.43%96.89%93.43%83.71%
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Liang, C.; Yu, B.; Xie, W.; Wang, B.; Peng, W. Fine-Grained Identification for Large-Scale IoT Devices: A Smart Probe-Scheduling Approach Based on Information Feedback. Appl. Sci. 2022, 12, 8335. https://doi.org/10.3390/app12168335

AMA Style

Liang C, Yu B, Xie W, Wang B, Peng W. Fine-Grained Identification for Large-Scale IoT Devices: A Smart Probe-Scheduling Approach Based on Information Feedback. Applied Sciences. 2022; 12(16):8335. https://doi.org/10.3390/app12168335

Chicago/Turabian Style

Liang, Chen, Bo Yu, Wei Xie, Baosheng Wang, and Wei Peng. 2022. "Fine-Grained Identification for Large-Scale IoT Devices: A Smart Probe-Scheduling Approach Based on Information Feedback" Applied Sciences 12, no. 16: 8335. https://doi.org/10.3390/app12168335

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop