This article explains how source scores are computed for material submitted directly to TruSTAR. For example, IOCs contained in email submissions or reports will receive a source score.
The source score is a quantification of the total value returned by an intelligence source to a private enclave. The scoring takes into consideration all of the historical data within each private enclave. Whenever new data is submitted, the score is recomputed.
- Normalized Indicator Scores explains how TruSTAR computes scores for indicators that have original scores from third-party intelligence sources.
- Priority Indicator Scores explains how TruSTAR computes priority scores for specific integrations. This scoring is available only through the Phishing Triage feature.
- Priority Event Scores explains how TruSTAR aggregates Normalized Scores for an event (such as an email (and assigns a score that reflects the overall severity of the event. This scoring is available only through the Phishing Triage feature.
The score is computed based on three different IOC scores:
- IP Overall Score.
- URL Overall Score.
- Hashes Overall Score.
Each of these individual scores can range from 0 to 100, and the overall intelligence source score is computed by averaging them.
Each IOC score is comprised of
- Uniqueness Score.
- Timeliness Score.
Let’s start by explaining how the uniqueness score is calculated. Consider that we are computing the IP overall score. In this scenario we have a private enclave that has 100 IP's. 7 of these IPs correlate with one or more intelligence sources A, B, and C. If we want to compute the uniqueness score of source A, we would follow this process: collect the number of IPs that were unique to source A - let’s say they were 2 IP's, collect the number of IPs that were in sources A & B - let’s also say they were 2, and finally collect the number of IPs that were in sources A, B & C - let’s say they were 3. The raw uniqueness score would be
2 + 2/2 + 3/3 = 4. In other terms, you can think of uniqueness score as the weighted sum of correlations with a source multiplied by the following weights:
1/(# of intelligence sources containing the indicator)
The timeliness score takes into consideration the time difference between the updated time of the private enclave report and the submission time of the source report. We assume that the report submission time corresponds to an incident time. If many intelligence sources and private enclave reports contain the same indicator, we pick the first submission in the private enclave and find the source report with the most recent enrichment. Again, you can think of timeliness as the weighted sum of correlations with a specific intelligence source w.r.t. to the following weights:
1/(# days difference between the private enclave report and source report)
As you can see from these weights the correlation become inversely proportional to the time difference in days. This results in prioritization of enrichment that is provided in a timely manner.
Once the timeliness counts are obtained they are normalized with respect to the total number of extracted IOCs for a certain type. For example, if raw count and uniqueness count were 6 and 4 for IPs and the total number of extracted IPs was 100, the obtained raw_timeliness_score and raw_uniqueness_score are, respectively, 6/100 and 4/100.
How It Works
In order to scale these scores to the 0-100 range, TruSTAR's team performed a study over all of the pairwise raw timeliness and uniqueness scores, between all private enclaves and intelligence sources on the TruSTAR platform. Most scores were skewed towards small values and clustered in a tight range between 0 and 0.35. To increase the interpretability of the data, we had to perform a logarithmic (base e) transformation. To rescale to a 0 to 100 range we find the scaling windows given by the following:
(log(raw_score/(# iocs in the private enclave)) - window_start)/(window_end - window_start)
As mentioned earlier, the window_end and window_start are obtained based on all the data on the platform. In our current example if window_end = -14 and window_start = -1 the computed timeliness and uniqueness scores would be 86 and 83 for an average IP score of 84.5. The same computation is performed for URLs and Hashes. If we have a URL score of 50.5 and a Hash score of 10 the final intelligence source score for source A would be 50%.
With the above scoring methodology you can end up with scores above 100. Based on our analysis we have found that scores above the window_end value (i.e. final score above 100) could be indicative of duplicate data. On the feature, we will issue a warning when this case arises.