The New Data Scientist
Movies like Moneyball, companies like Netflix and statisticians like Nate Silver have brought big data into focus for us. Given enough information and smarts, data experts can gain insights to help win baseball games on a budget, predict what we’d like to watch next and call the state outcomes for presidential elections.
What’s interesting, though, is that the “experts” doing this work aren’t always who you’d think they’d be -- top-gun statisticians or data scientists with heavy-duty computational skills. According to Virginia Tech business professor Barbara Hoopes, “Equally as important are people who can use cost-benefit analysis tools and have deep business understanding about what can be done with this asset, which is the data that companies have access to.” These are the individuals who can hear “the voice of the data,” she adds, and know what kinds of questions to ask of it.
Hoopes and computer science professor Naren Ramakrishnan teach in Virginia Tech’s Online Master of Information Technology program. The program brings together disciplines from the university’s engineering and business colleges and supplements foundational courses with six concentrations, such as “Big Data” and “Analytics and Business Intelligence.”
Demand for data experts
Courses about data topics are bursting at the seams, says Ramakrishnan, with many more students wanting to register than can be accommodated. That level of interest reflects the nature of the employment outlook for big data jobs. An article in Forbes magazine shared a finding that in the previous 12 months, just three companies -- IBM, Cisco and Oracle -- had advertised more than 26,000 open positions requiring understanding of big data. The median salary quoted in that article for professionals across the board with big data expertise was $124,000 a year. (The low end of that bell curve was $83,000 and the high end was $165,000.)
A major driver for this growth, he emphasizes, is just how data-driven we’ve all become in our own lives. “We’re as much data producers as data consumers,” he says. And while we all understand the idea that the digital crumbs generated by the activities of our lives are being picked up and stored somewhere, the truth is that the large share of that data is being left unanalyzed. “Most of the time people just archive it, and it is still sitting in a data warehouse as a curiosity.”
What’s different now, says Hoopes, is that on top of the collection of data, which has “always been there,” we now also have systems that let us explore it. “You have tools that are being developed that make this so much more accessible to your average manager, that are a little more point-and-click. So you don’t necessarily have to be a computer scientist to take advantage of digging down and understanding what’s going on in some of these large pools of data."
While a computer scientist may be well positioned to “write an algorithm” that mines the data, “without the business understanding, they wouldn’t know the right questions to ask,” Hoopes points out. That’s where someone with subject-matter expertise as well as training in business intelligence and data analytics excels, she says. “They have the business knowledge, but they also have an understanding of what’s possible through data mining.” With that “powerful combination” they’re well positioned to make the complex connections that define the use of big data.
As an example of a complex connection, the university’s Discovery Analytics Center, directed by Ramakrishnan, is a Virginia Tech-wide effort that brings together researchers and students to tackle applied problems in important areas of national interest. One recent project for a government agency examined the use of “surrogate” data sources to forecast societal events. Ramakrishnan’s team analyzed data from OpenTable, the restaurant reservation system, and found that a spike in reservation cancellations can correspond to a disease outbreak. “It could either be an early onset of the flu season,” explains Ramakrishna, “or it could be an episode of food poisoning. We do not know the specific reasons but such observations can serve as an early indicator.”
Curiosity, technical aptitude and business understanding are the hallmarks of those excelling in this field. “You have to have domain knowledge,” Hoopes observes. “Then there’s this extra curiosity factor and tenacity and a willingness to dig around a little bit and visit a few dead ends.” Those are personality traits, she says, “that are really valuable in this field.”
And they must be able to communicate the story of the data, says Hoopes. “If I told you something about a clustering algorithm, I might lose you unless I was also able to tell you, ʽOK, here’s an example,’ ” she explains. “That’s something that advanced training really helps with -- not only on how to do the analysis on the data but on how to communicate the analysis so it makes the most difference and has the most value to a business.”
The formula is working. “Our students are recruited into data-science companies such as Google, Facebook and LinkedIn,” says Ramakrishnan. But alongside those are the non-IT segments, such as automakers and oil and gas. “Many of them are launching data-science teams, so they’re all looking to staff these new positions that are coming up. The fact that our students keep getting requests for interviews is a good sign.”
For the Greater Good
Kathy Anderson was a systems-analyst developer for Virginia Tech when she decided to earn a master’s degree to increase her “knowledge base.” While she could have gone after an MBA, as many do in her position, she chose the road less traveled, electing to immerse herself in the areas that personally interested her -- statistical theory and data mining.
It was a plus, Anderson says, that the program she chose for her advanced degree happened to exist at the school where she worked. “I decided on Virginia Tech because at that point in time they were ranked in the top three for the master’s level IT program, and I thought it would be a great opportunity to graduate from such a prestigious institution.”
Quickly after starting, she was sold on online learning. “The program is excellent in both content delivery and the cohort aspects of it,” she says. “Every class in this particular program involved group work, and the group work enabled you to practice your skills in a real setting with the team.”
For example, a project in one class used internal data provided by a nationwide organization that wanted to optimize its sales performance. Another project examined “favorable factors” for corporate geolocation. A third project researched a major IT firm’s extensive patent portfolio and developed ways to help the company maintain the portfolio while also increasing its knowledge base for itself “and for the world at large for the greater good.”
While Anderson started out taking classes part time, she won funding “unexpectedly” from the Department of Defense’s SMART scholarship program, which targets students pursuing degrees in STEM courses, enabling her to quit her job and hit school full time. She graduated in May 2016 with a dual master’s in information security and data science. A stipulation of the SMART program is that the recipient commit to a year of service after graduation. So that’s what she’s doing now -- along with working as a teaching assistant in the same program that issued her degree.
Although Anderson can’t share what her current job is (“It’s kind of classified, which is the nature of the work.”), she insists that she’s using what she “learned and everything else.” In fact, she expects, once her term of commitment is over, to land another position in the same industry. “What I find motivating and rewarding is being able to work for personal remuneration, but also for the greater good of society,” she notes. “Being able to do that is what makes the work more fulfilling and purposeful. Having a sense of working for more than oneself is very motivating and makes [me] get out of bed every morning.”
Of course, Anderson isn’t lacking in personal drive -- a characteristic she says is essential to succeeding in an online program. “Based on my own experience, self-motivation is critical to your eventual success, as well as being able to do more than what is expected. Working hard is very important,” she insists.
Now she says she has to “fend off” recruiters. “The IT industry as a whole is short on IT skills already,” she says, “Having a graduate-level degree is a big advantage.”
What’s next with data analytics
The pace of big data isn’t expected to slow down any time soon. Hoopes believes the largest untapped area right now is in textual analysis. “It’s not as if it’s not currently being done, but I think its greatest power has yet to be revealed.” She points to the work of one Virginia Tech research team that’s analyzing reviews on Amazon to identify potential safety hazards.
“When you and I go out and visit websites, a lot of it is text-based,” she says. “And I think the tools to easily and quickly use that text to do some of the same kind of analysis and predictions and similarity comparisons that we can do with numbers, that’s still coming down the pike.”
Ramakrishnan considers the area of ethics a growing topic of concern to the field. “For example, if your resume is being looked at by an algorithm and the algorithm decides if you’re going to be called for an interview or not, how are you sure that the algorithms aren’t making bad decisions? When you have data-science concepts being applied everywhere across an organization, there are going to be a lot of ethical issues that come up.” The university won a National Science Foundation grant to create a course to address that subject, expected to launch in 2017.
He also suggests that data experts within organizations are going to become “highly specialized.” The data analyst in a manufacturing position will have a different set of skills, for instance, than somebody in marketing and advertising. As a result, the university is planning to weave a “plus-data” concentration into many of its disciplines to help graduates prepare for a world where data are pervasive and decisions are increasingly “evidence-based.”
Big data and data analytics “is a good space to be in,” sums up Ramakrishnan. “There are quite a few possibilities.”
Written by SmartBrief Education