Search engines are among the most frequently used services on the internet. We are all used to seeing long lists of matching results from the web whenever we enter a search command, and the whole process has become so easy and natural that very few people ever stop to think about how sophisticated the machinery behind the search field really is. Markus Schuch is one of these people. A developer in the “Search & Identity” delivery team at DB Systel, he has been responsible for helping users to find Group content using the so-called DB Suche internal search engine for around ten years. Railway employees can use the internal search tool on work computers, tablets and smartphones. Like with any other search engine, they simply click on the input field and enter their search term – for example, “holiday request”. When someone starts typing in a search term, DB Suche suggests related terms to help them find what they’re looking for. And although demand is high – with tens of thousands of visits every month – DB Suche has by no means exhausted its potential: after all, there’s always room for improvement.
User groups with different rights
A great deal of work has gone into making the search engine such a great success. Because finding information quickly – especially in a large corporate group – is no straightforward task. “Search engines like Google have it easier because their web crawlers capture HTML content that is publicly available,” says the search specialist. It’s very much in the website owners’ interests to ensure that their content can be found. For this reason, they prepare their websites and their video and image data in such a way that search engines like Google can understand the content and find it more easily. But internal content in a corporate group is not publicly available. The data – some of which is sensitive – can only be accessed by employees via the intranet. And that’s not all. Different groups of users also have different access privileges. This is why no one in the company is interested in carrying out complex search engine optimisation (SEO) for internal content. “We also tried web crawling at first but didn’t get very far because content was hidden behind logins and blocked access points, for example,” says Markus Schuch. Only logged-in users can access the content, in compliance with specific corporate guidelines. “We were forced to use back-end interfaces that we could secure cleanly.” Developers make this sound as easy as plugging a power cord into a socket. But it’s not quite that simple. To continue the metaphor, you first have to find out what kind of socket is in the wall, what cables are available and what signals the cables carry or are supposed to carry. “Every system has its own peculiarities. And because there are no standard solutions with uniform interfaces, we always have to build special connectors.”
Many systems, one search engine
Historically, different systems with separate search tools were used in the Group and users knew how each of them worked. “Of course, it’s much harder for users to find something if they don’t know where it’s stored,” says Markus Schuch. And he knows what he’s talking about. After all, he helped to develop search engines at Deutsche Bahn in the early days. “We have a web content management system that runs and hosts Deutsche Bahn’s websites on the internet and intranet. Each of these websites has always had its own search engine. When the team eventually improved the careers portal’s search engine, they worked with the open source technology Apache Solr for the first time. This formed the basis for a central search index. The system was able to find and rank information from known sources, internal databases and relevant blogs. The pilot worked so well in 2013 that a decision was made to eventually extend it to all Group employees and to connect more and more sources, including additional wikis and the Group Regulation Database.
At virtually the same time, the call for a single solution came from various quarters within the DB Group. The Group conducted a preliminary study on enterprise search and DB Systel provided the driving force for the development of a universal search tool. “Together with our corporate communications department, we developed a prototype and used it to try to connect other systems in addition to the usual websites. While the relevant system owners allowed access to the data, the search experts developed suitable connectors. To enable searching in different data sources, data must be catalogued into a central index.
The number one requirement is that the search tool only allows users to find content that they are allowed to see.
Not all information is supposed to be available to every employee – and not every employee is allowed to make information available. Therefore, any Group-wide search engine also has to check permissions during use and when search queries are entered. “The number one requirement is that the search tool only allows users to find content that they are allowed to see.” This sounds simple, but each source system has its own rules on how to handle user roles. “We had to figure out how to store access permissions generically in a search index so that we could filter the data at the time of the search based on the users’ permissions at the time.”
A matter of privilege
DB Suche is able to search a number of systems while taking the read privileges of the current user into account. But how does it do this? Put simply, the search engine stores information about the privileges required to read a document in a field in the search index. This is updated every 24 hours or so. The system retrieves new, modified and deleted documents in particular and updates the relevant access privileges for the documents in the index. When a user logs on to DB Suche, the machine goes into action and requests information about the user’s privileges from the source systems. The search engine then incorporates this information into the search query. In this way, the system can check at the time of the search which hits the user is allowed to read. If, for example, a user loses certain privileges, this information is incorporated into the search engine without delay. A change in the access privileges for a document takes effect after the next indexing run – i.e. after 24 hours at the latest.
Markus Schuch loves solving complex challenges such as these. “We raise the profile of employees and work areas within the Group and help make information easier to find and share with fellow employees.” What’s more, the close and trusting collaboration with the various parts of the Group and the access to internal data sets makes potential problems apparent. “Until now, the data in a system wasn’t considered holistically.” However, if data sets are to be merged in a meaningful way for the search engine, it’s important to look at all the available metadata. Only then can you filter the data sets by file type, author or date, for example. Inconsistencies in the data or data structures are encountered in the process because the connection to DB Suche provides a comprehensive view of the data for the first time.
In the Group, we use the information we discover to help eliminate the causes of poor data quality in the long term.
DB Suche reveals which areas are affected. Minor inconsistencies are ironed out directly in the connectors, while major problems have to be cleaned up by the providers. “In the Group, we use the information we discover to help eliminate the causes of poor data quality in the long term. The search tool leads to better data quality,” says Markus Schuch. When we see how many divisions use DB Suche to search its content, it is obvious how well the search engine has already been received in the Group: DB Suche now combs through some 80% of the most important sources in the DB Group, including the social intranet DB Planet, the Electronic Staff Directory (EVI), the Group Regulation Database, the DB Management Portal, DB HR Online and a number of wikis.
But DB Suche is more than a Group-wide search engine. As part of the larger “Starke Suche” project, it is also a sign of the Group’s digital transformation. The “Strong Rail” Group programme is helping Deutsche Bahn to get ready for the digital age. And this also requires that internal information can be found quickly and made generally available. It must also be possible for DB employees to find the external information they need for their work without delay. For this reason, the “Starke Suche” project also includes a connection to the sustainable external search engine Ecosia. This connection boosts the positive impact of the Starke Suche project even further, since the sustainable search engine uses the revenues generated to plant trees in more than 15 countries. Now that’s a strong search – one that guarantees impressive results for DB Suche both inside and outside the DB Group.