The interactive nature of information use draws our attention to key dimension – connecting information to people. The whole of Information Retrieval is focused on finding better ways to accomplish this goal
At a time when the entire focus is on IT, we would like to focus on information, especially its capture and retrieval. Not just as a countercurrent but also because it is in sync with the new (well not so new) fascination with analytics. Whatever be our views on analytics, this much is clear – it is all about information, which, at the risk of sounding monotonous, is all about what is captured and how. Ilya Markov and Maarten de Rijke of University of Amsterdam, in an article on ‘What should we teach in information retrieval?’ observe that “IR is about technology connecting information to people”. Reinforcing the non-technological dimensions of IR, they add that it is about understanding people’s behavior as they interact with others and devices. The latter point is worth reiterating.
Devices of all kinds, run mostly wireless technology at some frequency or the other, play a central role in our lives constantly capturing all kinds of data – this is really the inspiration for the Internet of Things. Naturally, while all this data can be captured, we must exercise judgment in deciding what to take up for analysis. The generation before the millennials might well recall the problem surrounding RFID when it was launched. It was thought of as being condemned to capture data, enormous amounts creating a serious problem of what to do with it. Thanks to Walmart’s about its use, RFID is quite common today. Just as are many devices.
There is a certain logic to talk about this here: we should specify the kind of enterprise whose IR we are analyzing. Take Cisco which embraced a new business strategy woven around what it christened the Internet of Things (IoT). Or consider HUL. Clearly, IR in both cases will be widely different both falling under the umbrella of interactive user behavior – with others and devices. The information captured, the way it is captured, why it is captured will all deeply influence its retrieval. Such phenomena will be a part of our investigations as we continue exploring.
Meanwhile let us take the example of Flipkart, the Indian ecommerce venture acquired by Walmart. As mobile phone began to be used for a variety of purposes besides calling there was an expectation that more ecommerce will flow through mobile phones or tablets, leading to a certain neglect of their website, only to discover later that this was not borne out by data. Since information about devices used by consumers was captured, Flipkart found that while users did use mobile phones in relation to ecommerce, they transacted through the company website. And went back to refashioning it to make to more suitable to users for transacting.
A great deal of subsequent problems can be avoided if enterprises focused on what information is most critical to their business operations including their supply chain or any vendor. In fact, anyone who is part of the ‘extended enterprise’. And understanding that one of the distinguishing characteristics of any information is granularity. More than anyone else, those who are familiar with the ‘Need to know’ principle will be aware of the levels of granularity of information.
As Marten and Rijke elaborate, “IR systems are best thought of as interactive systems – systems that are first developed offline and then put online (in the sense of being exposed to users), where they continue to be developed based on interactions with their users. Thus, the process of developing a modern IR system can be described as a two-step process: the initial development of the system can be called an offline phase, while the evolution of the system after its initial deployment can be called an online phase”. Most definitions of IR are a variation of this and we will stay with this definition.
The link to enterprise search
Enterprise search, according to Maarten and Rijke, includes search, recommender systems, task-oriented dialogue systems. In Modern IR, Hawking remarks that “The application of information retrieval technology to information-finding within organisations has become known as \enterprise search”. Enterprise search may be interpreted as search of digital textual materials owned by an organisation, including search of their external Website, company intranet, and any other electronic text that they hold such as email, database records and shared documents”.
Given the experience of most people (in the world) with Google in searching for information in any format, employees feel that finding corporate information should be fast and efficient. More important, it should be done through a single interface. In an article in the Journal of Information Science, 2019, Vol 45 titled ‘Enterprise search and discovery capability: The factors and generative mechanisms for user satisfaction, Paul H Cleverly and Simon Burnett of Robert Gordon University, UK, state this. “Results show that 62% of dissatisfaction events were due to human (information and search literacy) rather than technology factors. Cognitive biases and the ‘Google Habitus’ influence expectations and information behaviour and are postulated as deep underlying generative mechanisms. The current literature focuses on ‘structure’ (technology and information quality) as the reason for enterprise search satisfaction, agency (search literacy) appears downplayed. Organisations which emphasise ‘systems thinking’ and bimodal approaches towards search strategy and information behaviour may improve capabilities”. Echoing Hawking, they also add that enterprises attempt to recreate a Google-like search experience and end up dissatisfied, using a mixed method longitudinal study with feedback from 1000 users.
In any case, as Addision Wesley observes in ‘Modern IR’, 2010, ES is a serious challenge for IR because of all these factors – structured & unstructured information, multiple formats, different access levels, non-uniform classification methods, multiple legacies, sylos, to mention a few. Also keep in mind that enterprise search will not have so many links as search in general because of the difference in number of people using it. Let us recall Bill Gates’ observation in 2005 that the problem in ES is that there aren’t enough links to build a good search. Clearly, this must be stated as a constraint, the way we do in optimization problems. It may be argued that this is the root of the dissatisfaction – expecting the same level of performance (in all senses) from enterprise search as from web search. Rather, what is important is to address this ‘deficit’ – the absence of a large number of links – and factor this in building an efficient search for an enterprise. Perhaps, there is a clue in domain-specific and even sub-domain specific specialized searches, whose population is small but informed users. The limited nature of links may be set off against a (relatively) smaller volume of data to search through (as compared to a web search). In my view, this is a good foundation to build on, along with some other key factors.
Takeaways
Connecting information to people
Understanding people’s behavior as they interact with others and devices
The link of IR to enterprise search needs to be elaborated
Major cause of dissatisfaction with ES is human factors not technology
ES is a challenging problem for IR
Photo by Ralfs Eglitis from Pexels