Katelyn Ilkani and Dr. Kirk Borne discuss how behavioral analytics work, and how they are applied in cyber security. Dr. Borne also discusses the future of behavioral analysis.
You've tuned in to Security Economy, where we discuss big trends impacting the future of cyber security. We explore how technology, human behavior and money are driving cyber security forward.
In this episode, Katelyn Ilkani and Dr. Kirk Borne discuss how behavioral analytics work, and how they are applied in cyber security. Dr. Borne also discusses the future of behavioral analysis.
Let's hear from Kirk.
Kirk, thanks for joining me today. I'm excited for our conversation on behavioral analytics and cyber security. To get us started, I'd love for you to tell me and our audience more about yourself.
Great. Well, thank you, Katelyn. It's really great to be here today, and this is a very interesting topic for the world.
So, just a little about myself. Right now, I am the principal data scientist at Booz Allen Hamilton. Booz Allen Hamilton is a technology and consulting company.
In this role, I do some executive advising, mentoring, training, and a lot of social promotion of data science and analytics, AI and machine learning.
People may know me from social media, that's where I'm very active, so I find a lot of knowledge and interesting concepts on social media and articles I read and then I share it with people. Behavioral analytics is one of those topics which I find quite fascinating.
One of the reasons I find that fascinating is my background.
You may not see the connection here at first. My background is in astrophysics, and you say, "What does that got to do with that other topic?" But for me, as an astrophysicist, it was always about finding patterns in the data, and how things move in space and time.
Data analytics is essentially the same concept applied to other data sources, about planets and about stars, but certainly about things that move through space and time, like cyber actors, or employees, or customers, or medical patients.
There's a behavioral analytics angle on all of those cases, which is nothing more than again, collecting data from things that are moving through space and time and looking for what those patterns tell you.
That's very cool. I have not thought of it like that.
I always use the killer asteroid as my metaphor.
Don't be afraid; we don't have any evidence of an asteroid heading towards Earth to wipe out civilization, though there are other things happening around the world today in terms of viruses and things that have us all kind of worried, obviously.
But the killer asteroid issue is one that could happen someday, where we see a thing moving through space called an asteroid. We see it in orbit; we track it; we measure it; we collect data.
From the data, we can build a predictive model of where it's going, just like a customer or a cyber actor or a medical patient.
And if the outcome that we predict is not one that we desire, or prefer by any stretch, for example, the killer asteroid, that's a pretty extreme case, but with a medical patient who may not recover from an illness or customer who may take their business to our competitors, or a cyber actor who may get into our systems in a bad way. Those are all outcomes we don't desire.
If we can detect that through behavioral analytics, then we can use predictive analytics to say, "What do these behaviors lead to," and if we don't like that outcome, what can we do to change it? That's called prescriptive analytics.
You just heard my short little lesson on prescriptive versus predictive analytics.
Predictive says, "Here's what's coming." If you don't like it, what can we do to change it, and that's where we get into the analytic side of the equation, which is, what does the data tell us in terms of what forces what treatments, what actions, what moves can we make to drive this thing from its predicted outcome to a better outcome?
Kirk, can you tell us a bit more about what behavioral analytics entails?
Well, first of all, it's really just about patterns and data.
That's really mostly what analytics is about, of course, and behavioral has to do with, in a sense, I would say it's the, yes, behavior.
But what does behavior represent? It represents an intention, a motivation, an action tool towards a goal.
So human behavior, whether we're talking about customers, or patients or cyber actors, whatever behaviors indicate those things. What are you motivated by? What are your intentions? What are your goals? What are you trying to accomplish?
Of course, there's always good actors and bad actors in any cyber environment, and so behavioral analytics is really about understanding those patterns.
If there's an action, or an activity happening in your network, which is normal and expected for a particular user or user group, then that's fine.
But if you see some anomaly in that behavior, then maybe some kind of alert should be raised or some kind of action taken to see if it's your regular team just doing something new, or if there's an invader, infiltrator, worst case, even an insider.
The insider threat is one of the bigger threats in cyber security, which is the person who does belong there, and does have access, but they're doing things they shouldn't be doing.
These things are detected through the behaviors, that is the patterns indicated in the data.
I feel like behavioral analytics and cyber security has become more important, really, over the past few years in particular, as the insider threat has been more and more acknowledged.
Can you give us some more examples of how behavioral analytics is used in security?
The real reason I would think, even though I'm not a cyber security expert, but I think one of the reasons behaviorial analytics is getting greater visibility and application is the early days of cyber security was more reactive. That is you see something, and then you react, whereas we want to get to the stage where we're proactive.
We see it in advance and take action to ward it off. Prescriptive analytics, which I just talked about, has a funny name. It's called predictive reactive.
So it's not just reactive, which is bad because you just react after the horses left the barn, so to speak. And, predictive again, can predict a bad outcome, and if you just sit back and watch the bad outcome happen, that's not very helpful either.
So behavioral analytics gives us the insights into the behaviors, intentions, motivations, as I was saying, through those patterns, that gives us a chance to react in a predictive way.
That is, if we see this moving in this direction, we now know to take this action so you can sort of predict what your action should be or prescribe I should say, because that's prescriptive.
Prescribe what your actions should be, should you see this behavior moving in a direction where you think could be a compromised situation. So behavioral analytics is really, again, going beyond that reactive and just predictive to prescriptive type of analysis and reaction predicted in the cyber environment.
This makes me wonder about data privacy.
How does data privacy come into play for tracking all these behaviors and then making predictions about how people may be doing bad things in your network?
For the most part, I would say network logs do not carry personal identifiable information, PII.
You're looking at a network packet, and you're seeing behavior, and it doesn't mean you know the individual, or you've even labeled it as a particular individual.
Really, nobody's privacy is even closely to being violated there because you don't even know who it is, or what it is.
But if you get to the point where it looks like adversarial behavior or compromising sort of behaviors happening, then if it is someone in your company, you definitely want to investigate that insider threat.
If it's something outside of your company, again, you may not even ever know who the bad actor is, because it's coming from some other part of the world perhaps.
Again, there's no sort of identifiable information in that, but the pattern in the data suggests action needs to be taken.
I think we have sort of a little bit less of a privacy concern. We just think about it as this set of points, data points, moving through space and time, and it carries contextual information as to where it's coming from and when it's happening, and what things are happening, things this actor is doing, but it doesn't necessarily contain any identifiable information that we would consider a privacy violation.
What do you think regular people need to know about how businesses and governments could be using their data?
Of course, people need to know that organizations are using their data, which tends to be sort of a more negative knee jerk reaction when someone says that, but pretty much everything you do in the world today, is using your data.
When you're on your phone, for example, shopping, buying, searching, using the map, using your email, using your text messages. That information is there, and being used by the app.
We're sort of freely giving it but obviously, we don't want to freely give away personal information. Though we sometimes do, either intentionally or not.
Agencies, businesses, organizations of all kinds need to be aware of these concerns, and be able to tell people how they are using your data.
There's a lot of regulation now about data privacy statements, and data usage statements which organizations are required to tell people.
Being able to be transparent, hopefully helps people feel less threatened by the use of their data.
Again, when we talk about things like tracking and monitoring behaviors, we're not tracking you as a person, we're tracking your signal.
If your signals suggest you're doing something appropriate, logging into your bank account to check a balance, then no information is carried forth. That is the thing, no one's saying, "Oh, look at what Katelyn's bank account balance is." None of that kind of thing is happening.
What's happening is if, for example, Katelyn logs into her bank account, and then you start somehow looking at someone else's bank account, and there's no evidence in the system that you are related to that person, perhaps it's an indication of a hacker who's sort of jumping from account to account within the system.
That behavior, in the data, is the signal that's being paid attention to, not the name of the person or the details about them.
So as businesses think about adopting analytics for security, where do you think they should start?
I have a mantra which a lot of people have, and that is think big, but start small.
There's another expression, don't boil the ocean. I mean, that's sort of a business term, it sounds sort of awkward to me. I don't use that very often, but it's the same sort of concept.
And so what does that mean? Well, for example, start small might be just, again, understanding for a given user account on a system. Let's say, since we're talking about cyber systems here, for a given user account, there are certain applications that user typically uses during the course of the business day.
The email, maybe it's a query a certain database, or they're using certain software tools or packages, and just by keeping tabs on that, if you see any anomalies in that during the course of the day, users who are now using other than what's within the scope of their work, then that might be fine if they're assigned some new task or a given some new project, but it also might be some kind of warning signal.
Even just keeping a running list of what people are allowed to use, and then also maybe the applications, a separate list of projects they are allowed to use.
If a person is assigned to a new project, a project, then they inherit, so to speak, the privileges that go with that project, which are those other applications.
What things are within your reach that are allowed, and if things go outside that reach, then an alert can be generated. And again, the alert doesn't necessarily mean there's something bad, it's just a little extra investigation would be necessary to make sure everything is okay, and if it isn't, then actions can be taken.
So you don't need to get too complex, but if you want to move up the ladder, so to speak, to predictive and prescriptive, then you have to get sort of into the realm of looking at more temporal behavior, that is the behavior over a course of time.
If you see a certain pattern over time, of a particular user class within your business, we just call that a trajectory in data space. That trajectory is normal for a person, let's say a person who just was hired, and they become more and more proficient in their work, they start using different databases, different applications, different enterprise tools.
That's a typical trajectory, but if that trajectory starts diverting, then again, that's a warning signal.
The trajectory in that case actually lends itself to sort of prediction, right? A new employee starts learning these things, and they learn these other things, and they start using these new things.
That's a path and a projector, you can predict where that person should be in their journey as an employee.
If that starts deviating from where you predicted it should be going, that predictive thing that leads to an outcome you don't desire or perhaps again, need to investigate.
If some kind of corrective action is needed, that's the prescriptive step. The more advanced form of analytics is getting to that predictive and prescriptive phase. But even just sort of the first steps, which are the descriptive phase, what are the things that should be within the use cases and application domain of this particular user, and do they deviate or not?
Those sort of simple, what I call just some sort of kindergarten math, if you will, counting statistics. How many times does a person check their email every day; how many times do they use this application and access that database?
Those kind of counting statistics are signals that are descriptive of a person's normal behavioral practice on their job. Somehow those numbers start changing, the sort of the balance of your time you spend querying a database suddenly changes, that could be an issue.
These ideas sound simple, but I'm wondering for our listeners, if actually implementing it may be rather complex. Do you have any thoughts on what an implementation strategy looks like?
Not specifically, no.
Most enterprise software does keep logs of who is logging in, and when, and for how long, so those logs already exists.
In some sense, there are network log tools that track particular IP addresses and what users are accessing. Those things are already accessible, and again with this concept of start small, but think big if you can just start managing that.
So there are tools already that do some of this sort of counting statistics, if you want to think of it that way.
If you start adding a temporal component, that by itself, sounds daunting, but again, it's a matter of just tracking over time. These particular network accesses and application usages of a given employee or class of employees, again to track sort of the behavior over time.
I think the information is already present; I don't think there's necessarily a large learning curve. At some point, again, we do want to think big.
If we start small, but we do want to think big, so there are far more aggressive tools out there in terms of machine learning and AI that do more of a deep dive into the patterns, which identify the intents and purposes of individuals and motivations.
Not too surprising or not too unsurprising, or whatever is the world of marketing analytics is a good place to start looking, because in marketing, they track, in a non identifiable, non privacy violating way, the intentions and motivations and patterns of purchases of customers. Those things are again in just regular logs of the company. Search histories, purchase logs, things like that.
Those marketing tools already are aimed toward customer journey analytics. In fact, there are tools which are exactly journey analytics tools.
Imagining applying the same exact thing to cyber actors is not a far stretch, and so there are certainly cases where you can just almost off the shelf, take something used for another use case, just feed it a different data source, and it can be applied.
That's an interesting idea, to start thinking more broadly, looking at how analytics is being applied in other fields, for perhaps this to feel a bit less intimidating.
Yeah, that's why I like to use my killer asteroid metaphor, because again, it crosses different discipline boundaries, whether it's marketing or cyber, medical patients, the same sort of thing.
What do you see happening over time for a particular person, or another way to say that in marketing's speak is "persona."
The same language is used in cyber security. There's a person and that persona doesn't mean a person, that means a class of behaviors, a class of persons.
So, there might be a persona in your business of people who spend most of the day on the database, a class of people, that is a persona that spends most of their day in the call center answering calls, and therefore accessing customer purchase logs or complaint histories.
And then there's maybe people who spend most of the day on email, like me. Those particular personas sort of identify the typical behavior pattern in that class of individual if you see someone sort of moving outside that persona. Again, that doesn't necessarily mean a bad thing.
For example, in marketing, we all have personas, like a person may like outdoor equipment and sports events, and suddenly they're buying, you know, children's books. Well, that seems like a real deviation. Maybe they have a new baby at home, so there's a perfectly logical reason why someone may drift from persona to persona.
It's easy enough to identify those cases, and then that's basically what some of these marketing tools do. I'm sure cyber security tools are doing the same sorts of things, even though I'm not in the market of cyber security software, but I imagined it seems very logical that it should be true.
Kirk, as we look to the future, where do you think this field is headed?
Well, for one thing, we hear AI a lot in the conversation, artificial intelligence, and I think AI is going to help in a lot of ways.
It's an automated process. Okay, so people think of AI, primarily with the first letter artificial, and we think of all the horrible things that can happen with artificial robots and artificial humanoids and things, but I like to use a different word for the letter A, and that's more like augmented intelligence or assisted intelligence or accelerated intelligence.
The tools around AI, which is basically based upon the machine learning algorithms that are part of the behavior analytics or analytics in general, can help us do triage on these patterns.
As to say what things are normal, what things need immediate attention, and what things you know, maybe can be looked at later. So there's certain ways we can identify the patterns, things that need immediate attention and those that don't.
The AI is assisting the human investigator, whether that's your IT person, your IT department, or chief security officer or whomever. The AI can help manage the data flood, if you will, and identify the patterns, create the personas, identify drifts and shifts in the persona patterns of individuals and so on.
The AI is going to accelerate, augment and assist even more of these types of things. The other thing it is really good at is identifying the anomaly - the thing that's different from everything else.
In that sense, it really is emulating a human intelligence.
I always like to tell people to suppose you're in a dark room completely, totally dark room. You surround yourself with cameras, which are video cameras, looking in all the different directions - up, down, left, right, forward, backward. Basically six directions, and you're scanning non-stop with this video recording the room.
Now suddenly, if there's a little light that comes on, let's say someone lights a match in the room, you, as the human being, recognize that instantly.
You immediately see the light in the room. Whereas if you were to take this information from the cameras, the stream of data and high definition, streaming video feed, it would take minutes to hours to analyze all the data stream and to identify exactly what you point in the room that's happening.
Whereas what an AI would do, we would basically say, if everything in the room is black, then I just report zero.
You don't have to look at six high definition video feeds. It's just a zero.
Then as soon as there's something that changes, you change that zero to a one. And then immediately, you know, there's something that has changed without you looking through hundreds of gigabytes of streaming video.
And in a sense, it's emulating what a human is really good at: identifying the thing that's different. So that's really the important thing in cyber analytics, cyber security analytics, and that is to find that new emergent behavior that's different.
Sometimes they call that the zero day event in cyber security, right.
Our antivirus software is trained on patterns, and known viruses and known Trojans and known worms, and sort of known things that people do.
If there's some new behavior from the bad actors, it doesn't get trained into our antivirus software until maybe days or weeks or months later.
So how do you get the day zero event? There's one that isn't already part of your virus, software training. The anomaly discovered that AI provides, sort of sifting through mountains of data to find the one little matchstick so to speak, that's out of place is really what AI can help us with.
So Kirk, how do businesses prepare now for how things may evolve?
Well, one of the first shifts that has to happen is the culture.
That's usually what people say the number one roadblock or the number one challenge or the number one thing to focus on initially is to get the culture lined up with an analytical way of looking at their business.
For cyber security or behavioral analytics, cyber analytics to succeed, people have to be accessible.
In order for behavioral analytics in the cyber world to succeed, people have to be able to accept that this is a valuable contribution and assisting technology to the work that we're already doing.
They don't need to feel like their jobs are threatened. Getting into the data, understanding data, becoming more data literate, is probably the first step.
Data literacy is something I push quite a bit. It's not that everyone's going to be a hardcore data scientist, but really, people need to know that identifying patterns and anomalies and data is a very human thing.
We could do that with our data the same way we do it with our eyes and our ears all the time. We as humans detect these things naturally, and we're just training algorithms to do that.
Getting people and the culture to accept that this is an assisting technology, not a technology that's going to remove you from your job; it's going to make your job more successful, more efficient, more effective.
I always like to use those two words in sort of parallel: efficient and effective. You become more efficient as you find those anomalies as fast as possible. And it's effective, because it finds it fast and as good as possible.
In a sense, it's like the precision and recall metrics that we see on social media.
That is, if you do a search, did you find everything that you want? Did you find only the things you want, and is there not too much other extraneous stuff? That's really the problem in the cyber world, is that if we start looking at every possible bit of data in the network log, obviously, it's impossible.
I mean, it's just a fire hose, and so again, we need people to accept that data is our friend.
Data Analytics is a tool. And it's really making our jobs way more efficient.
Thank you so much for your time today and for being on Security Economy with me. It's been a great conversation.
Thank you, Katelyn. I enjoyed it.
And that's a wrap. Thank you for joining us for this episode of Security Economy. Check out our episode lineup at battleshipsecurity.com/blog, and don't forget to subscribe. See you next time.