|Untagged||23 Feb 2012 10:26 AM|
|Data Management for the Rest of Us by kvandersluis|
The Three W’s of Data Management
As companies increasingly treat data as a valuable asset, Data Management practices have become more visible in the enterprise. Having specialized in integration for a long time, I have developed a very integration-specific view of the information technology world. Over time, I would observe data management activities in various companies, and casually think with relative indifference, what is so special about data management? You have a large, complex database, but nothing interesting happens until the data actually moves some place. To my thinking, data is simply one of the raw materials used in the process of making the business work. It’s the applications, fueled by data moving through them, that implements the business processes. And ultimately, IT serves no other purpose than to automate business processes.
In my most recent client engagement, I was hired as a data integration expert, but because of staff changes, I soon moved into a more general data management and architecture position. I consulted with various technology groups within the company to counsel on topics like data integration, operational data stores, data warehouses, metadata management, and other data management disciplines. Our principles were guided by the Data Management Association (www.DAMA.org), and were also influenced greatly by Bill Inmon’s Corporate Information Factory strategy for Data Warehousing (www.inmoncif.com).
If you’re a data integration specialist like me, finding yourself amidst data management people is a new and unsettling experience. But over the last months, having sifted through many concepts across the data management disciplines, I have been able to organize the core ideas into distinct categories, and at the same time identify the position of integration technologies within the data management landscape.
3 W’s of Data Management
Data Management concepts can be organized by asking ourselves three basic questions about data in our enterprise. The questions posed are the What, Where, and the How of Data Management. What is Our Data? Where is Our Data Stored? How Does Data Move Through the Enterprise? Posing these questions and contemplating the answers leads to an understanding of the major categories concerning the Data Management profession. In addition, certain disciplines cut across these categories. As you answer the What, Where, and How of data, you must also consider the required degrees of security, data quality, governance, and metadata management, commensurate with business needs.
What is Our Data?
What is our business data? We need to establish commonly understood terms and definitions, and their relationships. This information should be centralized into a common business glossary, readily accessible to both business people and technologists. The glossary becomes the common vocabulary by which accurate communications take place across the organization. When the business requests new features in a software system, the common vocabulary is used to convey the desired features. Business and IT need a shared understanding of the terms uses, and the glossary provides this.
Beyond the glossary is a much more in-depth and formal study of the information in the enterprise. Information needs to be modeled using standard technologies that have evolved for this purpose. This typically leads to entity-relationship models constructed using tools such as ER Studio or ERwin. A company will typically take a top-down approach, first identifying subject areas for the business information. These are the core information categories which are the subjects of business processing. In banking, for example, subject areas would include Customer, Account, and Loan. Data Modelers then dive into the details, and proceed with building formal ER models of the information. Modeling starts at the conceptual level by defining entities and their relationships. From there, detail is added to arrive at logical models, which includes attributes for the entities. Lastly, physical modeling takes the logical model to create an actual implementation on a database like Oracle, SQL Server, or MySQL. As is common in the long-lived enterprise, legacy applications and databases have existed for a long time, so the newer models are unfortunately more a reflection of how we want the data to be defined and organized, rather than how it actually is defined and organized. Rationalizing the existing systems against the desired models is usually a long term task that seemingly is never finished.
Data Modelers, Data Architects, and Business Analysts play key roles in the “what” of data management.
Where is Our Data Stored?
After understanding what our data is, we need to know where the data is stored across the enterprise. The “where” of data management demands that we understand all the data stores used throughout the enterprise. These technologies span relational database management systems, document repositories, email systems, and even basic file systems. The front-line of systems process the core business transactions that essentially run the business. Data usually proceeds from there into centralized repositories, warehouses, and data marts where it is used to manage and plan the business. These repositories can be internal, hosted elsewhere, or even cloud-hosted. It’s important to recognize that data has its own lifecycle. For any given type of information, we need to identify which systems create it, and which can update, read, and delete it. Data in these repositories is subject to certain management activities like access control, encryption, replication, failover, backup, and archival.
Enterprise architects, data architects and database administrators play key roles in the “where” of data management. Enterprise and data architects map out the landscape of systems and applications within the enterprise to help align with business goals. Database administrators are the custodians of data as it sits in a repository, responsible for applying the proper management principles to guarantee performance, security, availability, and preservation.
How Does Data Move Through the Enterprise?
The "what" and "where" of data management largely deal with "data at rest" in transactional data stores, operational data stores, warehouses, content management systems, and data marts. Data's alter-ego in the enterprise is "data in motion", which shows us how data moves through the enterprise. We need to understand the paths taken by data as it flows through the various systems, ultimately leading to products being shipped, payments being received and booked into financial systems, and management reports being generated? Integrating systems is a practice in existence since companies have had multiple systems. Systems interact using a wide array of technologies, including messaging, services, file transfer, and shared databases, just to name a few. From a data management perspective, knowing the flow of data through the systems is critical to understanding the state of the data and the risks it is subjected to. This study also unlocks the potential of data, pointing to new uses in support of the business.
Understanding the data movement in the enterprise leads to a broader understanding of the data management world with respect to the different states of data. Data management activities like modeling, glossary development, and operational management have historically focused on data at rest, and have had little influence on the world of data in motion. Integration specialists along with a mature set of software tools have been successfully managing data in motion for over a decade. Yet the two worlds are still quite different, and much work is yet to be done to bring the formalisms from traditional Data Management disciplines for data at rest over to the world of data in motion. A comprehensive Data Management policy must unify data at rest with data in motion.
At the heart of a Data Management program is managing the byproducts of the activities described here. Metadata is information about other data. For example, data residing in a database is described by the tables and columns which define its layout. In fact, most of the information produced from activities I’ve described in this paper is metadata. A logical model, physical model, the business glossary, and XML Schema definitions describing the format of data in motion are all examples of key metadata in the enterprise. It is Metadata Management that ties it all together, providing a unified view of how data works across the enterprise. Metadata is stored and related in a Metadata Repository (MDR). The ability of an MDR to store and relate all types of models is critical to the success of metadata management, and the overall data management program itself.
The critical functions provided by the MDR include lineage tracing, impact analysis, business glossary management, and reusability facilitation. When a user views a business report and wonders what a particular data field represents, the glossary defines the term for her, and lineage analysis shows where the data came from, tracing it from the original source through each system and transformation that touched it. When IT wishes to alter a database table, impact analysis identifies all the downstream systems, reports, and data feeds that are impacted. Finally, the MDR becomes the central repository for IT and business to view available assets to use for new purposes.
Data Management is all about treating data as a valuable asset. In most businesses, data is critical to business operations. Disruptions in data flow, misinterpretations, unauthorized access, and data loss can have a significant detrimental effect on the business, sometimes being fatal.
As we contemplate the activities of Data Management, we must also not forget the role we play as information technologists. Information is at the core of every significant business. But we are not the business. Instead we play a support role. If you’re in retail, the company goal is to sell product. In Defense, you protect the homeland and its interests, in financial services, you make money by moving and manipulating money in its various forms, and you sell products that do the same. In all industries, money doesn’t come in the door simply by storing data and moving it around. It’s important to realize the role of Data Management, and the more general role of IT within the enterprise. These are support roles. You should strive to understand the business, and be in a position to partner with the business to instill confidence and align the efforts of IT with the goals of the business. Confidence in IT spurs an open flow of ideas, where technology capabilities spark new business ideas that business and IT can jointly exploit to further the goals of the business.