After reading this chapter, you will be able to:
Do you need directions to a friend’s new apartment? Want to find that website your adviser recommended? Thinking about upgrading your cell phone? Looking to rent your textbooks for next semester? Resolving a bet on who put up the winning goal in the 2008 NCAA championship? No matter the domain, our instinct is to Google the information we need.
Choosing to use Google before all other research tools may seem like a no-brainer. After all, Google gives us what we want, and it is so fast and easy! We might not know how or why, but Google always seems to give us the results that we like. And Google is not shy about telling us how great it is. At the top of each search engine results page (SERP) a software program declares that “About X results” were retrieved in “0.XX seconds.” This humble-brag seems unnecessary when all we really want is for the answer to our query to appear on the first page. In fact, we really want it to be on the first screen so we don’t have to scroll down, or maybe even as the top result. Fortunately, so often, Google delivers just that.
You are not alone if your preferred search engine is Google. Most internet users rely on Google. During 2017, approximately 75 percent of desktop computer users and approximately 93 percent of mobile device users searched the web using Google over all other internet search engines such as Baidu, Bing, or Yahoo!, according to NetMarketShare, a leading company that analyzes web traffic and web technology.
Using Google without questioning how it works makes sense to most of its users. Media scholars Ken Hillis, Michael Petit, and Kylie Jarrett, in their book, “Google and the Culture of Search,” state it succinctly: “Today Google feels like a good deal to most of its users. It is free, easy to use, and doesn’t require a searcher to reveal his or her ignorance about a subject in front of another human being such as a librarian.”
Given how much we rely on Google for our information needs, it’s helpful to consider just how Google does what it does, and whether we should be more critical when scanning through our Google results. In this chapter, therefore, we consider several questions about Google. Thinking about information systems, that is, about who and what — Google, libraries, librarians, and information and computer scientists — organizes information and how and why they do it, can help us become more conscious and critical consumers of information.
Does Google Have It All?
To understand how Google achieves its fast results, and to judge whether it is actually saving us time, it is helpful to consider how, and how much content, is included in Google’s database of information. There is a myth that everything is on the internet, and that all we have to do to access it is to Google it. Despite Google’s mission to “Organize the world’s information and make it universally accessible and useful,” not all the world’s information is included in Google’s database. If you want to be an expert at finding relevant information, you will have to think beyond this myth.
How does stuff get on the internet?
First off, not all of the world’s information is on the internet. For information to be included in the internet, it has to be in electronic format and stored within one of the many connected, networked computer systems. But the content indexed by Google is only a small fraction of the internet. Unless the information is open to search engine crawlers, Google’s algorithms will not be able to retrieve it. Google results only provide access to approximately “0.03 percent of the information that exists online (one in 3,000 pages),” according to a 2015 Popular Science article.
You can watch the following Google video to learn more about crawlers, or spiders.
As the video and other content explain, Google and other search engines use programs generically referred to as crawlers or spiders, which are algorithms that follow links on webpages to discover, scan, index, and sort webpages. But because the spiders’ crawling and copying can only include what is made openly available, Google’s index includes just a portion of the web.
Google not only scans, indexes, and sorts every word on a web page, but it also does this for every metadata field. Metadata means “data about data” that accompanies internet content. It can be created automatically or manually.
You probably are familiar with basic metadata fields such as title, author, creation date, and subjects, that are created for articles and books to enable fast search retrieval. Similarly, when you take a picture on your phone, the date the picture was taken is automatically assigned as the image’s metadata. If enabled, other data such as location and address book contacts also may be saved in the metadata. Users can manually add to the metadata by, for example, identifying everyone in the image or adding information about the event at which the photo was taken. Google indexes all the metadata associated with an image or a document, which help its algorithms to quickly match the words and phrases when searched.
Once spiders index information, many factors contribute to how Google quickly recalls and ranks results with precision. In its results ranking, Google favors websites that display quickly, over content that loads slowly. Google doesn’t have time to wait for pages to load because Google wants users to think its search is lightning quick.
What can’t I find?
Not all web content is created equally. Some content on the internet is intentionally hidden, while other content is fraught with errors that cause web crawlers and spiders to not discover and index it. This portion of the web is referred to as the deep web. If it includes illegal activity, the content is considered to be part of the dark web.
There are many techniques that webmasters use to prevent Google and other search engines from scanning and indexing their materials, according to Google. These blocking options include not publishing their URLs, embedding instructions in websites’ html code for accessing the websites or other content, and storing proprietary content in password-protected directories.
Proprietary information not indexed in Google may actually be the stuff that you need now, but it is stashed behind paywalls. This includes information like scholarly journal articles and other primary sources. Google discriminates against some content, and in favor of other content, and maintains guides and tools that favor some websites in search results ranking. This means that the stuff you really need as a student or professional may not even show up in your search results.
Is Every Google Search Treated the Same?
One way that Google speeds up its searches is by guessing which category each search falls into.
In a seminal article, “A taxonomy of web search,” Andrei Broder, a distinguished Google scientist, identified three categories of search types: navigational, transactional, and informational. Google also hinted that it divides user intent into these three categories in a 2017 handbook for search quality evaluators.
Let’s consider these three categories in greater detail.
A navigational search happens when a user wants to go somewhere in real life or online. Navigational searches are fairly easy to recognize and for Google to execute successfully.
There are two subcategories of navigational searches: “visit-in-person” and “website” searches. If you enter a street address or a name of a business, the search engine assumes that you are interested in visiting this place in person, and it offers you directions. Alternately, if you enter part of a URL, the search engine guesses that you want to visit the website, and will navigate to a specific website. If the search engine predicts your navigational queries correctly, then you are more likely to see the search engine as accurate, reliable and easy to use.
When a user performs a transactional search, he or she expects to end up having a dynamic interaction with an internet site. Google defines the transactional query as a “do” query because the user wants to do something on a website. These interactions could be anything from filling out an application, to writing reviews and ratings, to downloading files, to making purchases.
Transactional searches are the most easy to monetize, that is, to make money off of, according to Broder. Monetization occurs when the search engine recognizes that a user is potentially in the market to buy something. Users who perform transactional searches are identified as optimal recipients of targeted advertising. Internet industry leaders assert that many users view targeted advertising favorably. Facebook founder Mark Zuckerberg said as much during the questioning session of a 2018 congressional committee hearing,
“What we have found is that even though some people don’t like ads, people really don’t like ads that aren’t relevant. And that while there is some discomfort for sure with using information to make ads more relevant, the overwhelming feedback we get from our community is that people would rather have us show relevant content there than not.”
Zuckerberg and others believe that Google and other platforms retain their users’ loyalty by sufficiently responding to their transactional queries not only with relevant results but also relevant ads, as opposed to annoying, irrelevant ads.
In an informational search, the search engine algorithm might not determine the full intent of a user’s query. To identify and address the intent of search users, Google conceptually divides an informational query into two levels: the “know simple” query, and the more complex “know” query. A “know simple” query is one that easily can be asked in a trivia night contest question or in a multiple choice, fill-in-the-blank, or short answer test question. Google provides a “know simple” response in a familiar format on the results page, often without navigation to a specific site. You’ve seen these “know simple” results framed in a box at the top of the results page.
A “know” query, in which a user is doing research to gain insight and develop knowledge, is much more complex and challenging for Google to fulfill. In library science circles, a “know” search is also called a cognitive exploratory search. An exploratory search is a process of learning and investigating that requires looking at multiple sources of information, according to information scientist Gary Marchionni. This process of exploration requires more human engagement in browsing, comparing and contrasting, and evaluating results than simpler searches. Contrary to Google’s mantra of “fast and easy,” exploratory search takes time and requires us to examine results in an iterative process of searching, evaluating, and reformulating questions.
Does Google Know What I Want?
It may seem like magic to have Google offer relevant suggestions and useful results. So, does Google divine our intent? The short answer is, no. Dan Russell, a research scientist at Google, described Google’s combination of understanding user intent and user behavior as “divining intent.” But it is really the result of good computer programing based on data and research.
Internet search engines such as Google are designed to meet the information-seeking needs of their users, even when these users do a lousy job defining what they are searching for. Google is built on the assumption that internet users are not search experts, and that they do not use advanced search functions. That is, most Google users don’t identify keywords to describe their information needs, and they do not know search operators.
Does that mean that Google can read its users’ minds when they don’t explain well what they want to search for? Not exactly.
Google is programmed to help us “not sweat the small stuff,” by accommodating for our variants in capitalization, punctuation and spelling.
More importantly, it turns out, what we search for is not that unique. Other users before us have asked Google about similar things. Google reports that only about 15 percent of searching on any given day is unique. The repeat searches (approximately 85 percent) appear as autofill as we type a search.
Because most searches aren’t new, this allows Google to collocate and organize information into efficient, precise results. Once this information is organized, Google’s algorithm guesses what the most relevant information might be for the search term entered. The more users search for the same thing, the more accurate Google’s guesses become.
Google’s algorithm also personalizes the relevancy of our results and the order in which these results are displayed. This personalization contributes to our immediate satisfaction with Google, and adds to our perception that Google is easy. In order for Google to make it seem so easy to provide fast, precise, personally ranked results, algorithms interpret the intent of the user queries and inform result ranking.
Peer Tutorial: Google Searches
In this video, Grace Fawcett (JOUR 302, fall 2018) reviews three types of Google searches, and highlights some features of Google results.
Is Google Free?
We do not directly give Google money for the results it presents us, so that sense, Google is free. But we do pay Google indirectly. By feeding queries into Google’s search box, we contribute information about ourselves, and about the things that interest us. Google and other search engines store data about us and about our collective searches. Google uses this information to make its search results better and to motivate users to continue relying on it for their search needs. So the first way we pay indirectly for using Google, therefore, is by having Google use the information we leave with it to make itself function better.
Another way we pay Google indirectly is by contributing to the research that Google conducts on its search users.
For years, scientists, engineers, and researchers at Google (and other industry, academic, and government institutions) have been trying to improve their understanding of human-computer interaction (HCI) and information retrieval (IR), which is about figuring out what you want from the internet and how you get it. This research has included controlled scientific experiments, ethnographic observations of how people conduct online searches, and analyses of search logs.
Google and other online companies engage in constant surveillance to collect information about our online internet activities, and in analyses of what we look at, what we click, our reading level, and more. Such research helps technology firms discover what we prefer, dislike, understand, don’t understand, use, misuse, observe, ignore, remember, and forget, in order to design information systems that meet our expectations.
Each new search triggers a complex network of programs, to the point that no one outside of Google really knows how specific search results occur. It is even possible that the results that a user looks at are part of a current experiment that Google is running. Researchers outside of Google refer to this testing as “noise” in the results.
The last way we pay Google indirectly is that Google uses the stored data about us and our searches to play matchmaker, pairing us with targeted ads that are displayed alongside search results. Google’s main business model is to sell very targeted advertising to its users, based on the search terms these users provide it.
Google feeds our data into an advertising program called AdWords. AdWords is a real-time online broker that displays ads from the highest bidder that match the search terms and phrases a user types into the search bar. As you probably have observed, if you search for a pair of boots, Google will sell advertising space next to its search results to a company that identified boots as one of the keywords it wanted matched with its ad. Loyal and returning Google users are essential for its business model to succeed, because the more information users feed Google, the more targeted and successful its advertising can be.
Are Google Search Results Biased?
Google wants its users to feel that searching with Google is fast and easy, because this ensures that these users keep returning to Google. But we know from experience that doing research and learning, in the way that our teachers have taught us to do it, is slow and frustrating. What is the result of this difference between fast results and slow research?
We are drawn to fast and easy. But because fast and easy search results are possible only when result options are narrowed, using Google means that we continually put ourselves in a filter bubble.
Anytime search results are narrowed to make the results come up faster, results are actively left out while some results are favored. It may not be obvious what is appearing in our filtered results and what is missing from them. Thus, search results are never completely neutral.
Two aspects of Google’s search engine function illustrate the concept of bias: Relevancy results instead of possible null results, and biases in programming algorithms.
Google and other search engines are designed to avoid giving us zero search results, which is also known as null query results. If we misspell a word, an algorithm automatically searches for alternate spellings or correctly spelled words. If we do not know jargon, another algorithm retrieves semantically similar and related words, as if we had entered additional terms using a thesaurus. Unfortunately, we are often unaware how these algorithms affect the results.
While algorithms are learning from our null query experiences, we might not be learning as much from the experience. For instance, the search engine might learn from others’ internet searchers that the phrase “one and done” is related to the NBA Eligibility Rules. So instead of getting results that only include the phrase searched “one and done,” the results might include the NBA Eligibility Rules, which lacks the phrase “one and done.” This example makes it seem that always presenting something related is always a good thing.
However, consider an instance of when the lack of null results presents inaccurate information when credible sources are not available. Consider the Atlanic’s reporting on the problem of school shooting misinformation.
The problem of relying on algorithms to construct our search results, even if there are no search results, is that algorithms can be biased. When a computer program is designed to make assumptions about a query, any bias that we experience in the real world can creep into the computer program and present biased results. This video, Machine Learning and Human Bias, published by Google in 2017, describes a few forms of bias acknowledged to be present in search engine algorithms. Google has been criticized for not aggressively addressing various biases in its search results.
We recognize that search engines are not neutral, and we know it is impossible to completely remove bias. At the same time, it is very difficult for us to understand why the programming provided our specific search results. We do, however have a phrase to describe the phenomenon: The black box problem. To close this section, consider the quote from Science magazine’s staff writer Paul Voosen:
It [machine learning algorithms and search results] is all about trust. If you have a result and you don’t understand why it [computer neural network] made that decision how can you really advance in your research? You really need to know that there’s not some spurious detail that’s not throwing things all off.
Activity 1: Advanced Search
Activity 2: Impact of Context in Understanding User Intent
Activity 3: Google Alternatives
Using the search engine Answer the Public, explore the results of the following searches, and discuss the results that appear. You can also choose your own keywords to search.