Cell Phone Data: A Gold Mine for Telecoms
Cell phone companies are finding that they're sitting on a gold mine--in the form of the call records of their subscribers.
Researchers in academia, and increasingly within the mobile industry, are working with large databases showing where and when calls and texts are made and received to reveal commuting habits, how far people travel for public events, and even significant social trends.
With potential applications ranging from city planning to marketing, such studies could also provide a new source of revenue for the cell phone companies. "Because cell phones have become so ubiquitous, mining the data they generate can really revolutionize the study of human behavior," says Ramón Cáceres, a lead researcher at AT&T's research labs in Florham Park, NJ.
If you were an ATT subscriber and were near Los Angeles or New York between March 15 and May 15 last year, there's a 5 percent chance that your data was crunched by Cáceres and his colleagues in a study of the travel habits of the company's subscribers. The researchers amassed millions of call records from hundreds of thousands of users in 891 zip codes, covering every New York borough, 10 New Jersey counties, as well as Los Angeles, Orange, and Ventura counties in California.
The data set is a collection of call detail records, or CDRs--the standard feedstock of cell phone data mining. A CDR is generated for every voice or SMS connection. Among other things, it shows the origin and destination number, the type and duration of connection, and, most crucially, the unique ID of the cell tower a handset was connected to when a connection was made.
That let the ATT team know the location of a phone to within a mile radius at the time each CDR was generated, making it possible to determine the distance traveled from home by each cell phone every day. The group found that, on average, people living in Manhattan travel 2.5 miles most days, compared to five miles in Los Angeles. "But we also found that when you look at the longest trips people make, people that live in New York go significantly further, 69 miles on a weekday compared to 29 in Los Angeles," Cáceres says.
The cell phone networks are thinking about monetizing their data, says Jean Bolot, a researcher at network operator Sprint. This means a "two-sided" business model where they not only serve end users but also make money through relationships with other businesses. "This is new in the telco space but not in other areas--look at Google, for example," he says.
Blondel's research includes an analysis of connections between two million cell phone users in Belgium. It revealed that the French-speaking and Dutch-speaking populations of the country are barely connected by calls and texts. "This is interesting, since there are already discussions within Belgium about splitting the country in two," says Blondel.
With potential applications ranging from city planning to marketing, such studies could also provide a new source of revenue for the cell phone companies. "Because cell phones have become so ubiquitous, mining the data they generate can really revolutionize the study of human behavior," says Ramón Cáceres, a lead researcher at AT&T's research labs in Florham Park, NJ.
If you were an ATT subscriber and were near Los Angeles or New York between March 15 and May 15 last year, there's a 5 percent chance that your data was crunched by Cáceres and his colleagues in a study of the travel habits of the company's subscribers. The researchers amassed millions of call records from hundreds of thousands of users in 891 zip codes, covering every New York borough, 10 New Jersey counties, as well as Los Angeles, Orange, and Ventura counties in California.
The data set is a collection of call detail records, or CDRs--the standard feedstock of cell phone data mining. A CDR is generated for every voice or SMS connection. Among other things, it shows the origin and destination number, the type and duration of connection, and, most crucially, the unique ID of the cell tower a handset was connected to when a connection was made.
That let the ATT team know the location of a phone to within a mile radius at the time each CDR was generated, making it possible to determine the distance traveled from home by each cell phone every day. The group found that, on average, people living in Manhattan travel 2.5 miles most days, compared to five miles in Los Angeles. "But we also found that when you look at the longest trips people make, people that live in New York go significantly further, 69 miles on a weekday compared to 29 in Los Angeles," Cáceres says.
The cell phone networks are thinking about monetizing their data, says Jean Bolot, a researcher at network operator Sprint. This means a "two-sided" business model where they not only serve end users but also make money through relationships with other businesses. "This is new in the telco space but not in other areas--look at Google, for example," he says.
Blondel's research includes an analysis of connections between two million cell phone users in Belgium. It revealed that the French-speaking and Dutch-speaking populations of the country are barely connected by calls and texts. "This is interesting, since there are already discussions within Belgium about splitting the country in two," says Blondel.
This network shows phone calls between around two million cell phone users in Belgium over six months; each dot represents a tightly connected group of people, and its color represents the language they speak. The Dutch-speaking (green) and French-speaking (red) communities are starkly divided, linked only by a smaller cluster representing users in Brussels.
Research in this area is typically focused on aggregate information and not individuals, but questions remain about how to protect user privacy, Blondel says. It is standard to remove the names and numbers from a CDR, but correlating locations and call timings with other databases could help identify individuals, he says. In the MIT study, for example, the team could infer the approximate home location of users by assuming it to be where a handset was most located between 10 p.m. and 7a.m., although they also lumped people together into groups by zip code.
"I feel the scientific community should take responsibility for finding out how to trade off having useful data and protecting privacy," says Blondel. He is investigating the effect of techniques like using approximate rather than exact location information, or blurring the exact time stamps of calls from a data set.
"I feel the scientific community should take responsibility for finding out how to trade off having useful data and protecting privacy," says Blondel. He is investigating the effect of techniques like using approximate rather than exact location information, or blurring the exact time stamps of calls from a data set.