China Mobile Communications Corporation, referred to as "China Mobile," is China's largest basic telecommunications operator, leading the country in both mobile and fixed network user sizes. In recent years, the advancement of broadband access and mobile communication technologies has accelerated China Mobile's growth in the Internet business sector.
Back in 2018, following the prescribed policies and regulations, the Henan branch of China Mobile took the lead in initiating the construction of internet log tracing measures for users. By introducing HashData's cloud-native data warehouse, the Henan branch of China Mobile successfully implemented log querying and content auditing for users across the entire network, effectively addressing a large number of related requirements.
At the beginning of the construction project, the Henan branch of China Mobile oriented its technological approach towards robust storage and streamlined analysis. It adopted a traditional architecture solution for support, mainly based on the Hadoop technology system, and integrated MR + Hive SQL + HDFS + Flume.
The strategy adopted by the Henan branch of China Mobile brought about three significant issues. Firstly, the explosive increase in stored data led to a decrease in effective data utilization, pushing up the cost of data storage. Secondly, due to the tight coupling of computing capacity and storage, the system couldn't easily scale its storage space. The cluster architecture had weak data analysis capabilities, which resulted in the inability of the application end to implement various data fusion analyses, and also led to insufficient multi-concurrency capability and low query efficiency. Lastly, there were significant bottlenecks in storage access. The system couldn't support massive data scaling as needed, making it hard to meet the demands of analyzing a vast amount of log data due to the high operational and construction costs.
In the face of these challenges, the Henan branch of China Mobile outlined a fresh strategy for the project: They aimed to build a centralized log storage system that, while aligning with relevant business needs, also factored in future system expansions. They planned to unify the data collection, consolidating it on a provincial log storage platform. This platform would handle unified storage, data analysis, and distribution of log data, providing a variety of data services to different applications as required.This fresh approach set higher targets for the architecture rebuild, aspiring to achieve the following capabilities:
While ensuring a clear logical correspondence between computing units and data storage, and keeping the cluster I/O throughput, HashData has smartly designed a caching strategy. This, along with separating computing from storage, allows HashData to offer high availability, multi-dimensional elasticity, and extensive scalability.
HashData has also introduced a completely innovative cloud-native structure - an independent metadata state from computing nodes. This eliminates the metadata state from computing nodes, turning these nodes completely stateless (corresponding to the new shared-everything MPP architecture and the traditional shared-nothing architecture), all the while, preserving access to all data and metadata even when new nodes are added.
The main strategy of this new framework is to "virtualize storage resources while maximizing computing resources". This maximizes the support for data forwarding and analysis for the log retention platform, especially when dealing with rapidly growing data.
HashData's product uses its built-in ETL tool instead of Flume, and uses object storage instead of HDFS. By replacing Hadoop's cleansing and computing with user-defined functions (UDFs) and leveraging its inherent directed acyclic graph data structure and algorithm, the product executes log retention. Standard SQL and UDFs are used in place of MapReduce.
In the product architecture of computing-storage separation, computing and storage can be scaled out independently, offering more flexibility and significantly cutting down storage costs. HashData has fully rolled out the log retention system. While retaining the same volume of data as the original Hadoop system, it uses just about 40% of the original cluster size. The development cycle is 50% shorter, and query performance has increased tenfold, effectively meeting cost-cutting and efficiency-increasing objectives.
HashData merges the performance and rich analysis capabilities of MPP databases with the scalability and flexibility of big data platforms, plus the elasticity and agility of cloud computing. This blend of advantages has facilitated the construction of a next-generation enterprise-level cloud data warehouse for the Henan branch of China Mobile, truly optimizing costs and boosting efficiency. Looking ahead, the two parties will continue their collaboration journey on the digital path of telecommunications technology.
Copyright© 2024 HashData Technology (Hong Kong) Limited - All rights reserved.
Photos credited to Unsplash.