We create big data as soon as we open our first app in the morning, switch on our digital radio and, in some cases, even when we brush our teeth with a connected toothbrush. A stream of data pours out of our day-to-day digital lives and into the vast server farms across the globe. Even at Deutsche Bahn, the volume of data is now enormous. This is hardly surprising: every ticket sale, every train journey, every train, and every location in the DB infrastructure continuously generates new data, 24 hours a day, 365 days a year.
Big data refers to data volumes that are so large they can no longer be analysed using conventional data processing methods. This superabundance of data is the fingerprint of our digitised world. Each of the roughly eight billion people on Earth would need 22 brains to store all the information currently available globally. And big data continues to grow: increasing every two days by the amount of information accumulated from the advent of writing to the year 2000.
The reason for this veritable explosion of data is that more and more people are using digital technology to share information. When companies started to network things, this sizeable flow of data became a raging torrent. Today, industrial robots communicate on production lines, cars transmit their operating data to manufacturers and ICEs are monitored by train sensors.
An ideal optimisation tool
The vast amounts of data generated by operations of this kind are stored. But why? And why do companies like Deutsche Bahn take the trouble to analyse it? The reason is that you can find out a great deal from this data. It can be read like a book. It is as much a chronicle of past events as it is a reflection of the present. Certain analytical techniques can even extract recommendations for future action from big data.
All of this means that big data offers considerable opportunities. And Deutsche Bahn does not want to miss out on them. “With its Group data strategy, Deutsche Bahn aims to become the leading big-data-driven mobility and logistics company,” says Joachim Bürkle, Head of DB Systel Corporate start-up Zero.One.Data. “The Group is putting data at the heart of its development of new business models. And much of the optimisation of its core processes will also go hand in hand with analysing large amounts of data.”
Data is often called the crude oil of the digital age. However, as experience shows, having raw materials does not guarantee a community’s well-being. The situation is much the same with raw digital data. Simply having a lot is not always a lot of help. “As a result of digitised business processes, our customers’ data volumes are growing rapidly,” says Joachim Bürkle. The data strategy adopted by Deutsche Bahn aims to help to identify big data opportunities faster and more easily, to implement the associated application cases more quickly and to improve inadequate data quality.
How do you mine digital gold?
Big data is only truly big when the people who generate it use it to add value. If the owner of a fitness tracker gets his blood sugar levels under control, if a car owner saves an expensive visit to the workshop or if a city can reduce its crime rate, the transformation of big data into what IT experts call “smart data” has succeeded. If we apply this to Deutsche Bahn, success would mean lower energy costs, fewer unplanned train cancellations, and more reliable lifts and escalators, for example.
At Zero.One.Data, the necessary data refinement is achieved through professional consulting services, software applications tailored to customers’ specific needs and highly efficient hardware solutions. This ensures that the business partner’s data is stored, analysed and presented appropriately.
But how does smart data come into being? This calls for mathematical tools known as algorithms, with the help of which certain patterns, characteristics and processes can be detected in vast unstructured data sets. In principle, algorithms are precise, targeted instructions for computers that trawl through these data volumes.
Even at Deutsche Bahn, data has no real value in and of itself. It gains this value only once analytics teams have worked to extract relevant data on a particular issue from Deutsche Bahn’s data lakes. Earlier in the process, the data managers from the functional departments (the data owners) ensure that certain quality standards were met when the data was input. This is done in accordance with the guidelines defined by the central Data Governance department of Deutsche Bahn AG.
There’s no need to fear big data
“Of course, the depth of the relationship of the DB business units and their employees to digitisation varies,” says Joachim Bürkle. “Their relationship to the use of data and the intensity with which they deploy data intelligence methods also differ accordingly.” However, one thing is clear: the analysis of data has benefits for all Deutsche Bahn departments. This is not only because analysis of this kind is needed to determine whether a business unit is collecting the right data, but also because data intelligence can provide very specific recommendations for actions. There is no need to fear endless columns of figures. The data can be visualised in an easily understandable form on dashboards.
The range of big data analytics services is wide and varied. In asset intelligence, for example, this type of analysis can improve maintenance processes. Financial intelligence enables real-time insights into company-wide purchasing, controlling and financial data, among other things. In the field of customer intelligence, social media analysis and text mining can enhance communication with the customer – while respecting the General Data Protection Regulation and only with passengers’ consent, of course.
While it may be called big data, projects do not always have to start out on a grand scale. “At Zero.One.Data, we prefer an agile approach to develop relatively easy-to-understand analysis structures and algorithms,” says Joachim Bürkle. The goal is to develop minimum viable products (MVP): in other words, affordable, clearly structured and rapidly deployable test balloons in the field of big data. These are the first step on the road to larger projects.
Whether with MVPs or major projects, interaction based on mutual trust between the DB business units and the analytics teams provides the basis for successful big data projects. “It takes close collaboration with the customer,” says Joachim Bürkle. “‘Think big, start small’ is the ideal way to discover the potential of big data in your own department.”
Big data at Deutsche Bahn: the hot spots
Want to find out more about the various big data activities within the Group? Then the Analytics Community DB on DB Planet (accessible only to DB employees) is just what you’re looking for. This is where data scientists, data analysts and data engineers in the DB Group regularly discuss ideas.https://db-planet.deutschebahn.com/workspaces/analytics-community-db/members/list
If data generated is of a volume, diversity and speed that can no longer be stored by conventional databases or handled by common data management tools, it is known as big data. To enable this data to be used for business or research purposes, it is analysed with reference to a specific question using mathematical formulas (algorithms).
Business intelligence/business analytics
If a business wants to know how to act in a given economic situation, big data can help. For this purpose, all company data is collected, analysed and visualised with reference to a question, either as a bar graph, pie chart or in another form. The key performance indicators at the end of the month or the end of the quarter are compared with those from the past (what happened when?) and support decision-making in the present.
Statistics, mathematics and algorithms are the tools of data mining. The aim is to detect hidden patterns and connections. Commonly used methods include clustering (formation of groups of similar data), regression analysis (what depends on which other factors?) and association analysis (if one thing happens, another does too).
Every day, every individual generates around 650 megabytes of data. On top of this, there is the data from Industry 4.0. Creating a data lake is a logical way of helping companies master this torrent of data. A data lake is a central location where every department can store its data, but where external data also ends up. It is the electronic memory of a company. If this lake is not to become a swamp, it is necessary to know which data is truly valuable and which techniques should be used to analyse it in the future and for what purpose.
While big data and the data lake consist of both structured data (e.g. tables) and unstructured data (e.g. Word documents) from various sources, smart data is data that is examined for a specific purpose using certain algorithms and other tools. This data enables business models to be developed, and business processes and decision-making processes to be optimised.
- Descriptive analytics
This basic form of data analysis uses big data to figure out precisely what happened in the past. It is estimated that descriptive analytics accounts for more than 80 percent of business analytics. In most cases, descriptive analytics involves counting, filtering, and applying arithmetic. In addition, it is possible to analyse why something happened in the past. This extension is sometimes called diagnostic analytics.
- Predictive analytics
This form of analytics is more or less the extension of descriptive analytics into the future. Based on historical data, a statistical model is built to determine what might happen in the future. Predictive analytics reveals trends, clusters and special cases. Data lakes, algorithms and artificial intelligence are used for this purpose. Basically, existing data from the past is used to create non-existent data of the future. The goal is to develop decision-making options based on data models. For example, if AI is able to analyse the mood of a text, this is a form of predictive analytics, because this sentiment analysis results in variable options for action.
- Prescriptive analytics
In light of the technological advances in big data analysis, it is increasingly possible to use data not only to construct past and actual states, but also to predict what will happen in the future under certain circumstances. Prescriptive analytics goes a step further than predictive analytics. It gives companies multiple instructions in the form of data-driven scenarios for the future: in other words, what could you do to make one thing or another happen in the future? However, it is essential to have as constant a stream of feedback data as possible. This is what makes prescriptive analytics a difficult task.
Other articles on the featured topic of big data:
- Info-Tool „DB EarlyBird“: Leveraging big data to boost customer satisfaction
- Line agent app as data lake: Understanding what passengers really want
- Image and video analysis using AI: (Soft) focus on anonymity
- Digital Situation Room: The data refiners for Group KPIs
- From Industry 4.0 to Work 4.0: How technology is transforming our working life