1. Introduction
Urbanization has been a major driver to advance the economy and society, accompanied with ever emerging global and regional problems, such as traffic, air quality, and noise. The concept of the smart city has been introduced, and relevant technologies have been developed to improve inhabitants’ livability and address these challenges in urban areas [
1,
2]. The university campus is similar in several aspects to cities, which feature a high density of inhabitants and intensive infrastructure usage. Therefore, researchers adopt the concepts, technologies, and solutions of smart cities to implement smart campus initiatives, including the learning environment, energy management, safety and security, and infrastructure management [
2]. In the framework of smart cities, geographic information system (GIS) technologies together with geospatial data have been used effectively and geo-visualization is a key interface in nearly all projects of smart cities [
3].
Data visualization is an efficient and intuitive way to assist people in interpreting the patterns behind data [
4]. Dynamic spatial patterns and geographic knowledge can be uncovered based on the visual exploratory and analytical visualization of large-scale movement data [
5]. Trajectory data are a key source for surveillance and management of mobility agents, whether they are vessels [
6] or inhabitant individuals [
7]. To implement a successful smart campus, university management needs to understand campus inhabitants’ mobility behaviors well in every dimension, in which location and time are two basic factors. Spatial occupancy of campus space is one of the critical indicators, which measures where and when the inhabitants are in each room, floor, building, or open space [
8,
9,
10,
11]. Dynamic geo-visualization of spatial occupancy can be used to help conduct energy saving [
12,
13,
14,
15], optimize space utilization [
9,
10], and analyze risks of epidemic transmission [
16,
17]. Geo-visualization techniques play an essential and unique role in revealing spatio-temporal patterns of human distribution and utilization of infrastructures [
11], making the process of campus management efficient and effective.
In order to calculate spatial occupancy, we basically need to collect people’s positions using location-based technologies. A number of approaches for acquiring human positions have been proposed and implemented based on one or a combination of digital instruments, which are either deployed onsite or installed on smart devices. Passive Infrared (PIR) sensors can be mounted at certain locations to monitor and count moving objects in focused areas. It has been well-accepted in relation to privacy issues. However, the extra cost of installation and maintenance [
18] and false positive detection [
19] block its many applications. Surveillance cameras have been used to derive more detailed information about occupancy pattern [
20], but they have a reputation of high intrusiveness into occupants’ privacy and require high computational costs [
19,
21]. Card swiping data collected from security guards or auto-fare collection systems can help count people flow for space occupancy information [
22]. However, extra corresponding equipment is necessary, and it is almost impossible to conduct room-level counting in the context of the university campus. Smart devices, such as smart phones and smart watches, equipped with various global navigation satellite systems (GNSS) have been used for acquiring human beings’ real-time trajectories. It becomes an effective way of calculating spatial occupancy [
23,
24]. However, on one hand, it needs to be pre-installed applications on users’ devices and on the other hand, is not effective in indoor environments. Research based on call detailed records (CDR) of mobile phone telecommunication data [
25,
26] can avoid the preinstallation of applications, but its accuracy does not satisfy works on room and building levels.
As a basic information infrastructure service, wireless networks have been available on almost all university campuses in nearly every corner, which support education and research activities. When a mobile device, such as a laptop computer or a smartphone, connects to the wireless network, the networking system records connection information as system log data (Syslog), including the time that this device connects to or disconnects from the network, its Media Access Control (MAC) address, and the name of the access point (AP) with which the device interacts. Basically, each AP has a unique MAC address and is mounted in a specific position. It broadcasts network data over radio frequency (RF) signals and the signal strength varies in space, which can be used to determine the distance between an AP and a smart device connected with the AP. Human position tracking based on a Wi-Fi network has been receiving more and more attention [
9,
10,
14,
16,
18,
19], especially in campus applications.
The occupancy data are usually presented as a set of numeric values, which can be easily visualized by using bar charts [
27], line charts [
11,
22], choropleth maps, graduated symbols on a 2D floor map [
11,
28], etc. Using a line chart to visualize the occupancy of spatial zones can easily enable a comparison of the occupancy at different times; thereby, facilitating the analysis of changes in occupancy during different periods. However, this approach has certain limitations, particularly when it comes to comparing multiple spatial zones. Geo-visualization of spatial occupancy in different campus spaces can be implemented simply on 2D maps for a holistic overview. However, this approach struggles to capture spatial occupancy of vertical structures in buildings and cannot provide a comprehensive comparison of occupancy across all spatial zones within single or multiple buildings.
In this paper, Wi-Fi Syslog data are used to calculate and generate visualization of spatial occupancy on university campus. First, 3D building models of the campus are reconstructed based on LiDAR and construction drawings. Then, a Wi-Fi Syslog data preprocessing procedure is designed to extract time-varying online devices. Each AP is pinpointed according to original building CAD (Computer Aided Design) drawings or based on onsite measurement. Each AP is assigned to its corresponding spatial zones. A spatial zone is defined as a room, a floor, a corridor, or a building. Online devices connecting to the Wi-Fi network at each moment of the day change remarkably due to the issues of human behavior day and night, power performance management of the devices, etc. To reduce the uncertainties, a formulated ratio is proposed based on average online devices during the research period. Geo-visualization from the perspective of space, time, and humans is designed and implemented. Based on 3D building models, a 3D geo-visualization system is implemented to generate interactive visualization of spatial occupancy. Our main contribution is to design and implement a 3D geo-visualization method to present spatial occupancy based on structuring Wi-Fi log data.
The rest of the paper is organized as follows.
Section 2 introduces the methodology, including data used, reconstruction of 3D building models, preprocessing of Wi-Fi Syslog data, extraction of online devices, and design of the online 3D geo-visualization system. The results are shown in
Section 3 for demonstrating the visualization effectiveness of spatial occupancy.
Section 4 provides the discussion. Conclusions are presented in
Section 5.
2. Methodology
2.1. Data
The research area is the main campus of Capital Normal University. There are nine teaching and research buildings, eleven dormitory buildings, four administrative buildings, two dining buildings, one library building, and a number of facilities buildings on the campus. The Wi-Fi network covers all campus buildings and nearly all open spaces of the campus. There are over 3000 APs installed in most rooms and open spaces. To reconstruct 3D building models, LiDAR point clouds of the campus and construction drawings of buildings are used.
The Wi-Fi Syslog data span 70 days in a spring semester on one university campus. The data are the output of the network login and logout data in an unstructured text format from the Wi-Fi management system. Basically, each record contains information about the MAC addresses of devices, time of network login and logout, previous AP’s name of network connection, current AP’s name of network connection, online status of devices, received signal strength indicator (RSSI), operating system (OS) of corresponding devices, connection session length, etc. The MAC address of each device was anonymized by the owner of the data to avoid privacy issues.
Each AP has a unique name on the Wi-Fi network. An AP’s name normally indicates the space it is located in. In these cases, an AP can be attached to a specific room or an open space. For those APs that do not bear space information in their names, an AP scanner (
Figure 1) based on the ESP8266 microcontroller was built to scan the APs’ names, which were then attached to specific spaces.
2.2. Three-Dimensional Architecture Modelling
The 3D building models of the campus are crucial for the spatialization of devices and the visualization of spatial occupancy. To reconstruct 3D building models, LiDAR point clouds of the campus were first used to determine the envelopes of buildings. The average point density of the LiDAR data is 16 points per square meter. Building footprints were manually extracted. The height of each floor of a building was averaged based on the height of the given building and the floor number of the building. The scanning images of CAD drawings were acquired. The images were georeferenced to align the building footprints from LiDAR point clouds, which are in geographic coordinates. The georeferencing was conducted in QGIS. Each room footprint was digitized semi-automatically into 2D vector data. The height of each room was set to the corresponding floor height. Then, 3D room models were reconstructed based on 2D data and their height. There are a number of rooms or lecture halls crossing multiple floors, and their heights were multiplied by the heights of the floors they crossed. The height of each room is used to conduct an extrusion during visualization. At the same time, each room was attached with attributes, including room name and number, room function, and name list of APs in this room.
The 2D footprints of rooms on the same floor were aggregated to derive the footprints of the given floor. Both rooms and floors can be taken as indoor spatial zones for spatial occupancy visualization.
2.3. Wi-Fi Syslog Processing
As indicated above, we may find a number of descriptions of devices connected with the Wi-Fi network. Suppose one connection of a device with Wi-Fi network is considered as an event, which corresponds to one record in the Syslog data. Basically, four types of information can be extracted from the Syslog data: (1) when: the event occurrence time, (2) who: anonymized MAC address of the given device, (3) where: name of the AP(s) the device interacts with, and (4) what: event category which indicates the device joining/leaving the network or roaming from one AP to another AP. Besides the above-mentioned 4Ws, the RSSI values strongly correlate with the distance between devices and APs [
29], which can help further determine the position of a given device with a higher accuracy [
9]. Devices’ operating systems (OS) are also monitored by Wi-Fi network in the Syslog, such as Windows, Android, iOS, or others, which can be used to distinguish the type of a mobile device. Therefore, Syslog related to mobile devices can be extracted based on OS to exclude fixed devices (such as desktop computers, IoT devices, etc.) on occupancy calculation.
The original Wi-Fi Syslog was provided in an unstructured plain text format. Some of the missing numerical data are filled with various random text placeholders. Therefore, a structural extraction is conducted from the Syslog. The extracting results were exported into a database (
Table 1).
In order to provide a better description, a session is defined as a time period that a device connects to an AP until it disconnects from this AP. If the device roams from the first AP to another and then logs off from the second one, we define two sessions for this process. For the case in
Table 1, each of the three roam records is to be split into two records, an offline record at Origin AP and an online record at Destination AP with the same event time, respectively.
Due to the instability and uncertainty of wireless network connection of mobile devices, there are noises and missing data in Syslog. Time overlaps may occur between neighboring network sessions.
Figure 2a shows the status of sessions of the given device, as indicated in
Table 1. A shift happened simultaneously from Session 1 to Session 2, which was a reasonable roaming. Session 3 started at 9:40, but Session 2 went beyond 9:40, which means a late report of offline from the AP named TB1-2F02 and Session 2 might end at 9:40. The device roamed from TB1-2F06 to TB1-2F02 at about 9:42 and created an offline record at TB1-2F06. The network system was waiting for the device to send a disassociation message until a timeout happened, and the network forced the device off (again) from TB1-2F06. Therefore, the “dual offline” pattern might happen in the 4W database. Session 4 ended at 9:45, but Session 5 started about 30 s later, which is reasonable due to the re-connection between a device and the network.
To reduce the uncertainty of data, a data cleaning procedure was designed and implemented to conduct consistency correction among sessions from the table containing 4W information. The pseudo-code of the 4W data cleaning procedure is shown in Algorithm 1 and the cleaned result of
Figure 2a is illustrated in
Figure 2b.
It happens that some devices might connect to the network before the start time of the Syslog acquisition. Thus, their 4W records might begin with an offline record. Therefore, a makeup of the online record should be performed before data cleaning, and the online time should be set to the starting time of Syslog acquisition. Similarly, the offline record at the very end of the Syslog should also be made up (i.e., Session 6 in
Figure 2), but it needs to be performed after the data cleaning procedure because we need to find the exact online record to match it.
Algorithm 1: The 4W Data Cleaning Algorithm |
Input: A list of uncleaned 4W records of a device RAW4W |
Output: A list of the cleaned 4W records of the device CLN4W |
1 | OpenSession = null |
2 | for each Row in RAW4W do |
3 | if Row.Status is “Online” then |
4 | if OpenSession == null then |
5 | CLN4W.Pushback (Row) |
6 | OpenSession = Row |
7 | else |
8 | makeup an offline record RMakeup of the open session with attributes: Status = “Offline”, Time = Row.Time, AP = OpenSession.AP |
9 | CLN4W.Pushback (RMakeup) |
10 | CLN4W.Pushback (Row) |
11 | OpenSession = Row |
12 | end if |
13 | else |
14 | if OpenSession == null then |
15 | continue //No corresponding online record exist, consider it as an exceptional record. |
16 | else |
17 | if OpenSession.AP == Row.AP then |
18 | CLN4W.Pushback (Row) |
19 | OpenSession = null |
20 | else |
21 | continue //Means an offline record of other sessions, consider it as an exceptional record. |
22 | end if |
23 | end if |
24 | end if |
25 | end for |
2.4. Device Counting and Occupancy Calculation
Campus inhabitants are moving most of the time, which is reflected by devices’ moving on network connections. Counting devices connected with APs in a specific room at a specific moment can largely be used to represent the spatial occupancy of inhabitants in this room at this moment. For a specific floor, the results of all rooms and open spaces on this floor can be summed up to derive the corresponding spatial occupancy. Similarly, for a specific building, the results of all rooms and open spaces in this building can be summed up to obtain the building’s spatial occupancy at one moment.
Generally, the campus population remains stable in daytime and night-time. However, there is a gap in device counting in night-time as some smart devices may be turned off. On another side, larger spaces normally host more people, where more devices can be detected via Wi-Fi network. In this way the exact count of online devices in a specific space cannot reflect its relative spatial occupancy. So, deviation of spatial occupancy from its normalcy is of significance to campus managers. In order to investigate occupancy changes of in a given space to its normalcy, a ratio formula is implemented:
where
Cdt stands for the observed device count at
tth second of day
d; and
Bt stands for the baseline of normalcy at
tth second, which can best represent the normal condition among the research period. It can be the mean value or the median value of the online device count at the same moment on all dates.
Rdt is the ratio of the observed online count to the baseline value, which indicates whether the occupancy is higher or lower than normal.
2.5. Geo-Visualization of Spatial Occupancy
Spatial occupancy of different zones (e.g., room, floor, corridor, and building) can be presented in various ways, including statistical tables, bar charts, pie charts, flow charts, and many other types of charts. However, geo-visualization is unique in presenting geospatial patterns and temporal dynamics, which can deliver information more efficiently.
In order to present changes in spatial occupancy at room, floor, corridor, and building levels, geo-visualization at a specific moment and in a specific time period are both important. This study basically uses color schemes to represent the actual value and the density value of spatial occupancy. The values are normalized to 0–1. To enhance the visual saliency of visualizing spatial occupancy in different zones and at different moments, occupancy data normalization can be implemented with different maximal values (minimal value is 0). The normalization value (V) of spatial occupancy can be calculated by:
where
Rdt is the ratio calculated by Formula (1), representing the spatial occupancy of a specific spatial zone; and the
Max() function calculates a max value for normalization determined by the area of interest or the time of interest.
Table 2 presents various conditions for determining maximum occupancy values.
A set of colors is attached to a specific spatial zone according to its normalized spatial occupancy at a specific moment. A 3D object of a spatial zone is represented as a 3D polygon with an extruded height and corresponding attributes. A color scheme together with a corresponding timestamp sequence is attached to each 3D object. Original data have a high temporal resolution of up to 1 s. To reduce data volume and network transmission time, a color scheme and its corresponding timestamps can be resampled. A linear interpolation is implemented between two neighboring timestamps to improve visual effect during visualization.
The visualization system was designed as a browser/server structure which enables users to access from computers and portable devices. The programming implementation at the browser side was based on CesiumJS, which is an open-source JavaScript package for 3D visualization [
30]. Cesium Language (CZML) is used to accommodate time-sequence 3D geographical data in JSON format and supports streaming data over the internet [
31]. A geographical feature is stored as a set of geographic coordinates in WGS84 coordinate system. Time-dynamic color data are stored as a set of RGBA values ranging from 0 to 255. The server side was implemented using C# ASP.NET, which is used to conduct data queries from the database and build the required CZML based on spatio-temporal conditions from user interactions. The server is deployed on a Windows IIS 10.0.
A flow chart illustrating the procedure of 3D geo-visualizing spatial occupancy is shown in
Figure 3.
3. Results
Typical statistical charts can present general patterns of human activities. In
Figure 4, each curve represents one day for 24 h. The chart shows the overall changing trend of the amount of wireless devices online during the 70-day research period in the research region. Regular patterns can be found in this figure, for example, three prominent peaks in morning, afternoon, and late evening, and two prominent valleys at dinner time and before early morning on workdays.
Figure 5a shows the changing trend of the total online device count in campus at night-time (from 1 a.m. to 5 a.m.), which should remain stable. However, the number of devices keeps going down because the devices gradually disconnect from the network. The formula 1 above can be applied using the mean value of the observed counts as a baseline
B, and the result
R is shown in
Figure 5b. Then, variances can be calculated to evaluate the smoothness of the curves (
Figure 5c). A lower variance indicates a more stable curve, which better reflects the characteristic of night-time fluctuations of campus people.
Major activities on campus are undertaken by students and staff. It is necessary to review typical spatial occupancy patterns of students’ dormitory buildings and teaching buildings during work time and night-time, as well as the differences among classrooms with different course schedules. In the research area, there are four teaching buildings (TB1 to TB4). Four dormitory buildings are aggregated to Dorm A and another four to Dorm B for their neighboring positions and internal connectedness. Dorm A and Dorm B host undergraduate students, and the others are for graduate students. Classrooms for the students (primarily undergraduate students) are on the first two floors of TB1 and TB2 and the top floor of TB3 and TB4.
Figure 6 shows the time-varying spatial occupation patterns of building floors at five typical times on a Monday. From
Figure 6a, we can find almost all of the crowd was in the dormitory rooms before dawn. Some undergraduate students started their courses at 8 a.m., thus, the floors where classrooms are located are in darker red in
Figure 6b. Comparing
Figure 6b,c, we can see that some graduate students prefer to start working later than undergraduate students.
Figure 6d shows the occupancy pattern after lunch when most undergraduate students return to dormitory buildings because no courses are arranged at noon time. It can also be observed that a part of the graduate students returned to dormitory buildings and had a noon break.
Figure 6e is taken at 11 p.m. when teaching buildings stop services. However, a minority of graduate students remained in their labs for research.
Figure 7 shows classroom occupancy on the second floor of TB2 on a Monday morning. The course schedule is used as a reference. According to the schedule, rooms 207, 211, and 215 should be unoccupied at 8:40 a.m.(
Figure 7a). Room 211 is in dark red, indicating a high occupation beyond the schedule. To evaluate the accuracy of geo-visualization, class interval between 9:30 and 9:40 a.m. can be taken as an example (
Figure 7b). According to the schedule, the courses in rooms 201, 217, and 221 should have ended. There is a noticeable change in Rooms 217 and 221. It also shows that the hallway and washroom are in high occupancy. The course in Room 201 ended about 5 min later.
5. Conclusions
Geo-visualization is an effective approach in smart campus, which can assist surveillance, management, and improve service greenness and friendliness on university campus. Spatial occupancy is an essential factor in optimizing infrastructure and ensuring campus security and safety. Geo-visualization of spatial occupancy is more intuitive and efficient than traditional typical statistical charts. To acquire spatial occupancy over university campus, Wi-Fi network Syslog data have advantages compared to data from other location-based methods, which can derive spatial activity information of most campus inhabitants. We designed and developed a procedure for Wi-Fi Syslog data cleaning and structuring for the extraction of spatial occupancy. The 3D geospatial visualization of spatial occupancy in different zones was proposed and implemented.
In this work, we first reconstructed 3D models of campus buildings and then extracted 4W data based on structuring the Wi-Fi Syslog data. A preprocessing procedure was implemented to ensure data consistency and reduce uncertainties. Online devices are used as proxies to calculate the spatial occupancy of buildings at different moments. Geo-visualization of spatial occupancy is designed at room, floor, corridor, and building levels based on normalization schemes in six scenarios. Programming implementation was implemented based on CesiumJS, and CZML was used for online data streaming to support animated 3D geo-visualization. This prototype system can support applications for data analysis, in which campus managers and educators can interpret spatial patterns of individuals in large spaces (e.g., lecture halls or library reading rooms) at various time periods. In data preprocessing, anonymized MAC addresses in the Wi-Fi Syslog data and data aggregation on spatial occupancy calculation reduce privacy risks.