Geographic Information refers to all space-related subjects. One can imagine what a broad discipline it is. Long-term geographic observation produces huge amounts of data. Nowadays, traditional RDBMS (relational database management systems) are not capable of loading such huge chunks of data, let alone performing a high quality analysis research. The distributed processing of cloud computing technology provides a solution to this problem. In brief, this approach relies on remote data storage, analysis, and computing.
MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key(Dean, 2004). MapReduce is inspired by the map and reduce primitives present in Lisp and many other functional languages(Dean, 2004). The MapReduce library in the user program first splits the input into M pieces of typically 16 megabytes to 64 megabytes (MB) per piece (controllable by the user via an optional parameter). It then starts up many copies of the program on a cluster of machines. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
MapReduce Concept Map
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers(Chang, 2006). HBase is the Hadoop database. Use it when one needs random, real time read/write access to one’s Big Data. This project's goal is hosting of very large tables -- billions of rows X millions of columns -- atop clusters of commodity hardware. HBase is an open-source, distributed, versioned, column-oriented store modeled after Google' . Just as Bigtable leverages the distributed data storage provided by the Google File System, HBase provides Bigtable-like capabilities on top of Hadoop.(Apache project, 2011)
Hbase Database Model
To put it simply, HBase can be reduced to a Map
The Japan Aerospace Exploration Agency (JAXA) launched "KAGUYA (SELENE)" by the H-IIA Launch Vehicle in 2007. The Project is focused on collecting lunar surface topography data, performing spectral response analysis of lunar geology and searching for evidence of the existence of moisture on the lunar surface. During the nominal and extended operation periods, SP has acquired data from about 7,000 revolutions around the Moon and the total number of obtained lunar surface spectra is close to seventy million.
Using the Hadoop cloud platform and the HBase distributed database to build SELENE data cloud
The SkyEyes Smart Transportation Management Platform was has been used by Formosa Plastics Transport Corporation since 2001 and having accumulated tens of billions of driving record information, it has been called the largest driving database in the country. The previous solution relied on traditional commercial relational databases, which were unable to handle 10 million new data records being added every day. Driving records dating back 3 months or more must be stored separately on magnetic media, which is incovenient for clients interested in looking up older records and thus reduces the quality of customer service.
Formosa Plastics Transport Corporation has used this platform to import 10 billion records of historical traffic data, getting results with very little hardware investment. Searching for an online driving record takes less than one second, which marks a significant performance improvement in comparison to past approach. Moreover, historical traffic data can be used as a basis for the travel time estimation.