Mapping of BIM and GIS for Interoperable Geospatial Data Management and Analysis

Jack C.P. Cheng; Yichuan Deng


Developed tools and algorithms to allow bi-directional mapping between IFC and CityGML.


Three use cases were developed to show the implementation of BIM and GIS integration in the AEC industry.

Background



Building Information Modeling (BIM)

BIM provides a solution for the interoperability in the AEC industry by providing an information backbone throughout the building lifecycle. BIM is the process to create, store and manage the information related to buildings throughout their whole life cycle (Eastman et al. 2008). Using BIM, different parties involved in the building process can work on a common platform, where the cost of information sharing is much less.

Geographic Information System (GIS)

GIS is a system to capture, store, manipulate, analyze, manage, and present all types of geographical data (Sweeney 1999). Traditionally, GIS is based on 2D maps, in which objects are assigned 2D references such as longitude and latitude. Currently 3D GIS is also emerging. Some applications also require collaboration between the GIS and BIM models. .

Need for Integration                      

It has been shown that construction activities require data from GIS models to perform operations such as automatic site layout planning (Su et al. 2012), construction activities tracking (Cheng and Chen 2002), and waste management (Robinson and Kapo 2004). Meanwhile, BIM models and CAD data are valuable data sources for reconstruction of 3D GIS scenes. The AEC industry requires data from both BIM and GIS, and a seamless data integration between BIM and GIS should be achieved.



Data Schemas of BIM and GIS



Mapping of Nutual Data Schema

Reduces cost for integration, reduce efforts for integration development

Industry Foundation Classes (IFC)

Open Standard (official International Standard ISO 16739:2013); The most popular BIM standard in the AEC industry in EXPRESS representation

City Geography Markup Language (CityGML)

Accepted as official OGC standard in 2008 Semantic information; Supports five Levels of Detail (LoD) ; The Neutral GIS standard



Motivation of Research




 

BIM provides a solution for the interoperability in the AEC industry by providing an information backbone throughout the building lifecycle. Using BIM, different parties involved in the building process can work on a common platform, where the cost of information sharing is much less. While BIM aims to solve the problem of interoperability between stakeholders within the AEC industry, the integration of BIM with other systems, such as GIS, is becoming increasingly important. In the AEC industry, it has been reported that more than 80% of information could be referenced to the geographical information (Kim et al. 2012).

GIS is a system to capture, store, manipulate, analyze, manage, and present all types of geographical data (Sweeney 1999). Traditionally, GIS is based on 2D maps, in which objects are assigned 2D references such as longitude and latitude. Currently 3D GIS is also emerging. There have been several studies concerning the application of GIS in the AEC industry. Su et al. (2012) reported a GIS-based dynamic construction site material layout evaluation for building projects. . Some applications also require collaboration between the GIS and BIM models. For example, Strzalka et al. (2011) presented an urban scale heating energy demand forecasting system by combining information from GIS and BIM models. Thiis and Hjelseth (2008) tried to use BIM and GIS to enable climatic adaptations of buildings.

It is shown in previous studies that the BIM domain and the GIS domain have mutual need of information from each other. The AEC industry requires data from both BIM and GIS, and a seamless data integration between BIM and GIS should be achieved. The automatic data mapping between data schemas in the BIM models and GIS models must be achieved first in order to exchange data seamlessly. While there have been considerable amount of research efforts on the integration between BIM models of different data schemas (Garrett et al. 2004; Wang et al. 2007; Wang et al. 2008), the integration between BIM models and GIS models has not been fully studied.

 





Outcome 1: Evaluation of IFC 4 Schema for the GIS Domain

Read the published paper











Outcome 2: Schema Mapping Using the Linguistic-based Method

Read the published paper




In the mapping between IFC and CityGML, the number of entities to be compared and inspected is large, and therefore computer-aided semi-automatic ways would be needed. There are 1008 IFC entities and 608 CityGML entities defined in the schemas. If we simply inspect existing instances, the mapping between IFC and CityGML might not be fully discovered. The complexity of the data standards requires the development of ways to perform semi-automatic mapping considering the context of the entities. A  linguistic-based methodology framework was proposed first for semi-automatically mapping BIM and 3D GIS schemas. The linguistic-based method uses text-mining techniques to discover the relatedness of entities through their names and definitions. Component-based manual inspection generates the results for validation of the linguistic-based method.

The linguistic-based method uses results of text mining techniques to perform relatedness analysis and facilitate the mapping discovery process. The entity definitions in schema documents were extracted and compared to entity definitions in the other schema. Pairs with higher similarity results are likely to be the identical or related. To evaluate the similarity of the entity definitions, Cosine Similarity, Jaccard Similarity Coefficient, and Market Basket Model were used.

The first step for the calculation is entity definitions extraction. There are 1008 entities in the IFC schema, and all of them have descriptions and definitions. For the entities in the CityGML schema, considering their referencing to the GML schemas, 607 entities with definitions were found in the documentation of the CityGML and GML schemas. All the 607 entities have descriptions and definitions, which could be extracted directly from these schemas because the CityGML schema is represented in XSD (XML Schema Definition) format.

The second step is the tokenization of the entity definitions. All the stop words in the entity definitions were removed and the remaining text was stemmed for further calculation. Stop words are those that occur so frequently that they may not be as relevant to the query as the query to the whole document (Strzalka et al. 2011). Some commonly seen stop words are “is”, “for” and “to”. The remaining text in the definitions was then stemmed. Stemming is the process to change words to their stem or base form (Willett 2006). The stem of a word is not similar to the morphological form of the word, but it ensures all related words have the same stem. The Porter Stemming algorithm was adopted in this study to find the stems (Willett 2006).

The third step is to formalize these stemmed definitions into feature vectors for further analysis. The feature vectors were generated as: if concept n (e.g. window) appears m times in the definition, the n-th value of the feature vector of definition would be m. Besides the consideration of Term Frequency and Inverse Document Frequency, the entity names were also considered for the comparison. According to (Lipman 2009), some mappings can be discovered directly by comparing their entity names. Certain entity definitions do not repeat the entity names, so name comparison should also be considered. All the entity names were split into phrases and the “ifc” and “gml” prefixes in entity names were removed. For example, the entity name “IfcWallStandardCase” was split into “wall standard case” and then tokenized and stemmed . The entity names were compared to other entity names as well as entity definitions and different weights were assigned to different comparisons.

The proposed linguistic-based semi-automatic mapping framework was implemented on a platform we developed using Java. The platform mainly consists of three parts: (1) parsers for XSD (XML Schema Definition) files of the CityGML schema and HTML files of the IFC schema, (2) a similarity comparison engine that calculates different similarity scores, and (3) a program that reports the comparison results to a spreadsheet and generates mapping candidates. XSD files are represented in the XML format and therefore could be parsed by standard XML parsers. In this part of research, the open-sourced JDOM 1.1.2 (Hunter and Lear 2012) was used for developing the parser for XSD files of the CityGML schema. For the IFC schema, definitions of IFC entities were extracted from the IFC documentation HTML files (buildingSMART International 2007) using a HTML parser that was also developed based on JDOM. Tokenization of entity names and definitions is needed before similarity comparison can be conducted. In this study, the Porter Stemmer in Apache Lucene (Apache 2012) was used for the tokenization process. After tokenization, a table of all the distinct tokens in entity names and definitions was generated, which was used for generating feature vectors for similarity comparison.

As shown in the figure, the linguistic-based method can generate reliable mapping candidates. The linguistic-based method compares the definitions and names of entities to generate mapping candidates, and entities referring to the similar content will have similar descriptions. The recall at the ranking threshold can reach 0.375, which means that by inspecting the first 100 candidates from the result, we could find 37.5% of the true matches. Considering the large number of entities in IFC (1008) and CityGML (607), the results from the linguistic-based method narrow the search space and reduce the human effort for mapping discovery. The linguistic-based method does not require the domain knowledge. As the methodology suggests, the linguistic-based method is performed on the basis of syntax comparison, in which people need not to know all the terminology from both domains. This feature of linguistic-based method indicates that it could be further applied to other schema mapping problems. Moreover, the Term Frequency and Inverse Document Frequency are also applied in the process, which generate a dictionary of words used in one domain and there frequency. This terminology pool could be further analyzed and used for other schema mapping problems. The linguistic-based method could also suggest candidates for 1-to-M mapping. The traditional mapping methods on the entity level only considers the entity names and may not be able to find mapping between one entity and many entities. For instance, the entity “AbstractOpening” in CityGML could be mapped to “IfcWiindow” or “IfcDoor”, but the mapping could not be discovered by name to name comparison. The linguistic-based mapping, which utilizes the definitions of entities, could broaden the scope of comparison and generate more 1-to-M mapping results. Another example from the linguistic-based mapping is the mapping between “AbstractBoundarySurfaceType” entity in CityGML with IFC BRep entities such as “IfcBoundedSurface”, “IfcFaceOuterBound”, “IfcFaceBound”, “IfcBoundingBox”, and “IfcClosedShell”. The 1-to-M mapping could be discovered in a threshold of 25.








Outcome 3: Schema Mapping Using the Instance-based and Schema Mediation Method

Read the published paper











Outcome 4: Transformation of Different Levels of Detail (LoDs) in 3D GIS Models

Read the published paper




Among all the common 3D GIS standards available nowadays, the City Geography Markup Language (CityGML) has the most sophisticated definition about LoDs. CityGML is a common modeling language for 3D city objects launched by the Open Geospatial Consortium (OGC) in 2008. CityGML defines city objects such as buildings and infrastructure in terms of topographic object information, semantic information and appearance properties (Gröger and Plümer 2012).  CityGML also supports five distinct LoDs for buildings in city models ranging from 2.5 dimensional regional models (LoD0) to detailed building models with interior information (LoD4). The LoDs in CityGML contain many features such as building interiors and furniture that LoDs defined in other GIS schemas do not. CityGML defines five distinct LoDs for efficient visualization and data analysis (Gröger et al. 2012). Different LoDs can be stored for the same object simultaneously, which provides different resolution for viewing and analysis. Although the official CityGML Encoding Standard gives descriptions about each LoD, no clear definitions were offered, so users often create models in different LoDs according to their own understanding. In addition, the CityGML Encoding Standard did not provide methods to transform between LoDs.

To achieve a complete and accurate transformation between LoDs in CityGML, this study first gives precise definitions to each LoD in CityGML, which are not provided in the CityGML Encoding Standard. A new exterior shell extraction algorithm is proposed to simplify buildings with interior features. The transformation between LoDs is thus completed based on the new exterior shell extraction algorithm. The framework utilizes the open source citygml4j Java library developed by Claus Nagel (Nagel 2013), converting CityGML into Java classes and handling data access, storage and generation processes. To perform an efficient transformation process between LoDs in CityGML, geometric and semantic data parsed by citygml4j are stored in a newly designed data structure to implement the new exterior shell extraction algorithm

Extraction of the exterior shell of a building is a common step when trying to simplify digital building models. Although the exterior features and interior features should be clearly defined in CityGML, some of the input models may not necessarily distinguish the features as interior or exterior. Models transformed from other data formats such as IFC or KML may not contain information of whether the surface is interior or exterior. There is a need for finding the exterior shell for some of the input models. It is also an essential step in LoD4 to LoD3 transformation. As discussed in Section 5.2, the common way of extracting the building envelope is to track the footprint. However, this footprint tracking method is not applicable to buildings with roof overhanging parts as the roof overlays the projection of walls. In Fan et al. (2009), the authors tried to solve this problem by calculating the centroid of building and comparing the distances of each surface to it. However, this proposed method is not applicable for buildings with a non-convex shape as the centroid does not necessarily reflect the “center” of the building.  The new exterior shell extraction algorithm uses the same idea as Ray Tracing algorithm, yet to determine the exterior surfaces out of a detailed input model, the view points to shoot the rays will be changed according to model input. After data processing, the CityGML models are broken into many planar surfaces with corresponding semantic information. Our goal is to find the exterior shell of these surfaces by checking their visibility from the exterior. Our algorithm works as follows: first, find the exterior, which in our case is a bounding sphere of the building. This is calculated in O(1) time for each building. Next, the visibility of each surface against points on the bounding sphere is checked. The visibility of the surfaces is determined using the Ray Tracing algorithm, assuming that the observing rays shoot from points on the sphere. If three randomly generated points from the surface are all visible from the bounding sphere, the surfaces are considered visible and thus on the exterior shell of the building. To avoid fault judgments where some of the exterior surfaces are not visible form the bounding sphere, the program would also check the contiguity of the generated exterior shell. If holes exists in the surfaces, the program will double check the surfaces inside the holes to see if they are exterior. The process is illustrated in the figure above.

LoD4 models are the most detailed model in CityGML, including almost all the geometric information about the building parts. LoD4 models differentiate from LoD3 models in CityGML in terms of interior building features, such as rooms, stairs and furniture. The transformation of LoD4 to LoD3 models basically removes these features and changes the LoD4 geometry (i.e. surface, solid, curve) to LoD3 geometry. When preserving the LoD3 features, the program would also check if the feature complies with the geometry accuracy requirement. The methodology introduced in (Fan and Meng 2009) would be used to judge whether the features comply with geometric accuracy requirements and decide whether or not to keep the feature. A considerable amount of information is lost when transforming a model from LoD3 to LoD2 due to the gap of data requirements between the two LoDs. Openings such as windows and doors are removed in the transformation process. Other features such as outer building installations will also be removed. Buildings in LoD1 are boxes without any roof structure. So the LoD2 to LoD1 transformation is simply removing all the roof structures and building the envelope for the remaining walls. Although LoD1 is the typical block model, the shape of walls is still kept in the model. All the remaining surfaces are written into a solid CityGML LoD1 model.

The proposed LoD transformation framework was implemented in a Java application and tested using a computer with a dual-core Intel i5-2400 CPU (3.1 GHz) and 4GB RAM. The test data sets came from the CityGML models generated from BIM models. All the input data sets were LoD4 models. The LoD transformation was performed in a down-grading fashion in which higher LoDs were converted into lower LoDs. The developed LoD transformation framework could successfully transform LoD4 models to different LoDs without information loss, even for buildings with complex geometry. The transformation among LoD4, LoD3, LoD2 and LoD1 involved few geometric transformations. Therefore, the running time of the LoD transformation shows a linear relationship with the model size. The transformation between LoD4 and LoD3 involved the exterior shell extraction, with the quadratic running time and memory consumption to the model size. Nonetheless, the running time for most ordinary dwelling 3D CityGML models was less than two seconds on our platform. For a two-story house model, the maximum running time was only about 4 seconds. The proposed exterior shell extraction algorithm is a critical step in the LoD4 to LoD3 transformation. The traditional footprint correction method or the method finding the building centroid is not applicable for complex 3D building models. Buildings with non-convex building envelope cannot be handled by such algorithms. The exterior shell extraction algorithm in our framework, on the other hand, uses all the surfaces of the models as input and tries to find surfaces that are visible from the outside of building. These surfaces are the exterior surfaces of the buildings and should be kept in LoD3 models. The exterior shell extraction algorithm considers all the possible cases of visibility of surfaces against the bounding sphere, so it is applicable for even non-convex shapes. The complex non-convex models indicate the effectiveness of the algorithm. The surfaces inside the corner of buildings can be discovered and kept in the generated LoD3 models.








Use Case 1: Integrating BIM and GIS for Noise Evaluation in Urban Environments

Read the published paper











Use Case 2: Integrating BIM and GIS for Supply Chain Cost Minimization in the AEC Industry

Paper Under Review




While there is abundant literature on the design, application and evaluation of construction supply chains, the effective use of geographic data in CSCM has not been fully explored.  GIS is potentially applicable in construction supply chain management to manage spatial information, and provides an ideal solution to manage costs of transportation and market analysis in the overall E-commercial activities.  In this research, three problems are solved when considering geographic data in the CSCM system. The first problem is the selection of supplier, when the supplier with the lowest cost may not be the nearest supplier. The second problem is to determine the number of deliveries, as there is an adverse relationship between the number of deliveries and inventory cost. Increasing the number of material deliveries will decrease the need for onsite inventory, but it will increase the fees for delivery. The third problem is the location allocation problems for consolidation centers (CC) for construction material for multiple sites. The objective of the integrated framework is to minimize costs in construction supply chains by generating optimized solutions for selecting supplier sites, determining the number of deliveries and allocating consolidation centers. Information from 4D BIM is used to determine the construction activities and their material demands. A bidding and requisition module is developed based on GIS to allow suppliers to bid for material orders. Based on the actual travel distances from the supplier sites to the construction sites, the integrated framework calculates the delivery fees that are used to optimize three solutions, namely the supplier selection, the number of deliveries and the allocation of consolidation centers.

The construction supply chain management process requires massive data input as well as reliable analytical functions to provide management decision-making. Therefore, a CSCM framework should at least have three layers: the data retrieval and storage layer, the analysis layer, and management interface. The use of 4D BIM and GIS provides opportunities to minimize the manual efforts for data input, and all data can be stored in the geo-databases in GIS with customized analytical functions. A comprehensive 4D BIM model stores the full range of information about activities with associated material demand and duration. In the proposed BIM-GIS integration CSCM framework, a detailed quantity takeoff of the construction project is executed at the early stage of procurements from BIM, and GIS is used to support the wide range of analysis functions to provide decision-making in the CSCM process. The data from BIM are exported to databases linked to GIS, which will be used for analysis. The manually input data are mostly related to the quotations for material from multiple suppliers and changes in current project plans. It is possible that some of the construction activities, for example scaffolding, are not present in the 4D BIM. In this case, manual inputs of such activities along with their material demand and time duration are needed. In the proposed CSCM framework, the data storage, analysis and decision-making are performed in a GIS system. GIS has the capacity to store massive amounts of data and the ability to access fundamental analysis tools such as route finding for delivery cost analysis.

As shown in the figure on the left, we can find consolidartion centers with low cost. By using data from BIM and GIS, we formulate the three problems in mathematical form and provide solutions to these problems. Specifically, we make the following contributions to these three problems: firstly, we prove that selection of suppliers should not only consider factors such as delivery distance or unit price solely. Secondly, we should understand that the number of material deliveries has impacts on the total invoice cost of the supply chain, and we provide a Monte-Carlo Simulation solution to this problem. Thirdly, we prove the necessity of setting up of consolidation centers given the congested sites and long delivery distances using mathematical modelling. And finally, we provide a solution to the location-allocation problem of the setting up of consolidation centers. It is noticeable that all the contributions could not be made without the data inputs and analysis functions in BIM and GIS.








Use Case 3: Integrating BIM and GIS for Underground Utility Management

Read the published paper











List of Related Publications




Dr. Jack CHENG'S Google Scholar Page

Dr. Chimay ANUMBA'S Google Scholar Page

Dr. Yichuan DENG'S Google Scholar Page

Publications from this project:

1. Cheng, J. C.; Lu, Q.; Deng, Y., Analytical review and evaluation of civil information modeling. Automation in Construction 2016, 67, 31-47.

2. Deng, Y.; Cheng, J. C.; Anumba, C., A framework for 3D traffic noise mapping using data from BIM and GIS integration. Structure and Infrastructure Engineering 2016, 1-14.

3. Deng, Y.; Cheng, J. C.; Anumba, C., Mapping between BIM and 3D GIS in different levels of detail using schema mediation and instance comparison. Automation in Construction 2016, 67, 1-21.

4. Deng, Y.; Cheng, J. C. P., Construction supply chain coordination leveraging 4D BIM and GIS integration. Proceedings of the CIB World Building Congress 2016, Tampere, Finland 2016.

5. Cheng, J.; Deng, Y.; Anumba, C., Mapping BIM schema and 3D GIS schema semi-automatically utilizing linguistic and text mining techniques. Journal of Information Technology in Construction (ITcon) 2015, 20, 193-212.

6. Cheng, J. C.; Deng, Y., An Integrated BIM-GIS Framework for Utility Information Management and Analyses. Congress on Computing in Civil Engineering, Proceedings, 2015 ASCE International Workshop on Computing in Civil Engineering, IWCCE 2015; University of Texas at AustinAustin; United States 2015, 2015 (January), 667.

7. Cheng, J. C.; Deng, Y., Modeling and management of utility information using a 3D BIM-GIS integration framework. The Second International Conference on Sustainable Urbanization (ICSU 2015) 2015.

8. Cheng, J. C.; Deng, Y.; Anumba, C., Mapping BIM schema and 3D GIS schema semi-automatically utilizing linguistic and text mining techniques. Journal of Information Technology in Construction 2015, 20, 193-212.

9. Cheng, J. C. P.; Deng, Y., Automatic transformation of different levels of detail in 3D GIS city models. International Journal of 3-D Information Modeling 2015, 4 (3), 1.

10. Deng, Y. Mapping of BIM and GIS for Interoperable Geospatial Data Management and Analysis for the Built Environment. Hong Kong University of Science and Technology, 2015.

11. Deng, Y.; Das, M.; Cheng, J. C. P., A framework for integrating energy simulation data in building information modeling. Proceedings of the 3rd International Conference on Civil Engineering, Architecture and Sustainable Infrastructure (ICCEASI 2015), Hong Kong, China 2015.

12. Cheng, J. C.; Deng, Y.; Das, M.; Anumba, C., Evaluation of IFC4 for the GIS and Green Building Domains. Computing in Civil and Building Engineering (2014) 2014, 2216-2223.

13. Deng, Y.; Cheng, J., Integrating BIM and GIS for Urban Planning Purposes Considering Acoustics. In The Twenty-seventh KKHTCNN Symposium on Civil Engineering, Shanghai, China, 2014.

14. Cheng, J. C. P.; Deng, Y.; Du, Q., Mapping Between BIM Models and 3D GIS City Models of Different Levels of Detail In 13th International Conference on Construction Applications of Virtual Reality, London, United Kingdom, 2013.

15. Cheng, J. C. P.; Deng, Y., Mapping BIM models and 3D GIS models using schema matching and linguistic methods. The 12th International Conference on Construction Applications of Virtual Reality (CONVR 2012), Taipei, Taiwan 2012.

Awards Showcase








 




Meet Our Team




Jack C.P. CHENG (鄭展鵬) PhD , Stanford University

Associate Professor

Department of Civil and Environmental Engineering

The Hong Kong University of Science and Technology

Email: cejcheng@ust.hk

RESEARCH INTERESTS:

•Construction information technology and knowledge management

•Building information modeling (BIM)

•3D Geographic Information System (GIS)

•Internet of Things (IoTs)

•Carbon footprint measurement and auditing

•Green buildings and sustainable construction

•Smart construction and smart city

 

 

Chimay J. Anumba, FREng., Ph.D., D.Sc., Dr.h.c., P.E.

Dean and Professor. College of Design, Construction and Planning

University of Florida

Email: anumba@dcp.ufl.edu

RESEARCH INTERESTS:

• Advanced Engineering Informatics

• Artificial Intelligence

• Knowledge-Based Systems/Knowledge Management

• Concurrent Engineering in Construction

• Facility Management

• Architectural Engineering

 

 

 

Yichuan Deng, PhD

Assistant Professor

Department of Engineering Management

South China University of Technology

Email: ctycdeng@scut.edu.cn

RESEARCH INTERESTS:

•Building information modeling (BIM)

•Construction supply chain management

•Reality capture (drones)

•Computer vision








Download the BIM-GIS Convertor (Beta)

Click here to download the convertor




Note to Users

Please be informed that when we develop the parser for CityGML/IFC files, we made the following settings:

• Works for CityGML schema 2.0 and IFC 2x3, support for other versions will be deveoped depending on demand

• Test for IFC models generated from Revit

• Works with Java 1.8 and a 64-bit operation system

• In IFC, we assume that the sweeping direction of swept solid is perpendicular to the sweeping plane.

Please contact cejcheng@ust.hk or ycdeng@connect.ust.hk for enquiries.

User Guide for the Standalone Translator with GUI

Extract the download .zip file to a single folder. Please make sure that the  jsdai.properties and IFC Temp.ifc are in the same folder as of the GUI jar file.

Double click the jar file and you can see a GUI like this:

 

 

 

 

 

 

 

Select the file location of CityGML/IFC file to be translated, and specify the file location to be created. Please be reminded to add the file extensions (.xml/.ifc). Click the translating direction button to start the translator.

 

To know how the translator works, please refer to our paper: Mapping between BIM and 3D GIS in different levels of detail using schema mediation and instance comparison. Deng, Yichuan; Cheng, Jack Chin Pang; Anumba, Chimay J. Automation in construction, v. 67, July 2016, p. 1-21, 2016.








All rights reserved. Copyright @ HKUST, 2016