Building an Observable-Query Intelligence Source Integration

Updated 1 month ago by Elvis Hovor

This document is a language-agnostic engineering guide for developers who want to integrate observable-query-style intelligence services with the TruSTAR platform. This guide is intended for developers who will host their own code in their own environment.

Architecture Notes

  • 15-minute delays between each run of this script are required.
    • TruSTAR analyzes API traffic daily. API Credentials with activity that does not show 15-minute breaks between script runs will be deactivated.
  • Only one set of TruSTAR API credentials can be used in a single process at any given time.
    • If same set of TruSTAR credentials is used in multiple applications, processes, or threads, they will fight each other for the limited quantity of per-minute API calls allowed to a credential set.

Configuration

Each Customer using this integration will submit to the intel source provider:

trustar_service_user_api_key

trustar_service_user_api_secret

The TruSTAR API key and API secret for a TruSTAR service account.

Recommendations:

source_enclaves

One or more Enclave IDs that contain data the app submits to TruSTAR. This data can be logs, cases, emails, or whatever source data your organization generates that you want to use.

destination_enclave

A single Enclave ID where the app will store enriched data (Reports and/or Indicators).

Descriptive client_type , client_version , and client_metatag are required.

API credentials whose activity do not show meaningful, descriptive values in those fields will be deactivated.

Pseudocode

  • Load configs + checkpoint from persistent storage.        
  • Validate configs (proper access to necessary enclaves)        
  • Check the source enclaves for new reports using explicit from/to times     (endpoint:  search reports )             
    • search reports endpoint response is ordered newest report —> oldest "lastUpdated" timestamp.
    • TruSTAR is an asynchronous platform. "to time" should always be 15 minutes prior to present moment.

      If "to time" is closer to present moment than 15 mins, you run the risk of missing data.
  • Aggregate all the report-shells into a list/array.
  • Re-order the array from oldest to newest based on the report's "lastUpdated" timestamp (easier for checkpointing) 
  • declare consecutive_failures = 0
  • For each report shell (oldest to newest):             
    • Get the observables TruSTAR found in the report     ( endpoint:  get indicators for report  )           
    • For all observables in the report:
      • Extract enrichment about the observable from all the Intel Source's endpoint(s) appropriate for that observable type .   
      • Transform the endpoints’ enrichment to a single dictionary  (enrichment dict - see next section for dict formatting) .    
      • Build the TruSTAR Report (API Docs , SDK Class) object, placing the enrichment dict in the Report’s “body” attribute.                
      • Up-sert Report (see algorithm in next section) to TruSTAR.
      • add attributes to the observable as TruSTAR Observable Tags :  
        • Transform the endpoints' enrichment to a TruSTAR Indicator + Indicator Tags
        • submit the observable as an individual TruSTAR Indicator with Tags  ( endpoint:  submit indicators )                   
    • update persistent-storage checkpoint with report’s “timeUpdated” attribute’s value. 

Report Upsert Algorithm

  • check to see if a report about this observable already exists (endpoint: get report status)  (use external ID - see below)
  • if status not UNKNOWN or SUBMISSION_FAILURE or SUBMISSION_PROCESSING: 
    • re-check every 5 seconds until one of those 3 statuses.   
      • time this “wait” loop out after 5 mins.    [case: TruSTAR is down for maintenance]   
  • if UNKNOWN:  submit new report     ( endpoint:   submit report )   
  • elif SUBMISSION_SUCCESS:  update (overwrite the old one completely with the new)     ( endpoint:   update report ) 
  • elif SUBMISSION_FAILURE:  
    • try to get full report     ( endpoint:  get report details )  (using external ID)
    • if 200:  update the report    ( endpoint:  update report )      [the report was once submitted & processed successfully, but most-recent update attempt failed] 
    • if 404:  submit the report as new report     ( endpoint:  submit report )      [all attempts to submit this report have failed] 
  • wait 5 sec before checking status       ( endpoint:   get report status )
  • check status + wait until submit  / update is finished processing (either SUBMISSION_SUCCESS or SUBMISSION_FAILURE)   (5 minute timeout) 
    • if timeout:  Raise exception, terminate application.     [TruSTAR is down for maintenance]                    
  • if SUBMISSION_SUCCESS:  
    • consecutive_failures = 0  
    • log, continue.  
  • elif SUBMISSION_FAILURE:  
    • if failure reason == too many observables or > 2MB:   
      • shrink or remove the “related_observables” k/v pair.   
      • upsert again. (careful, protect against infinite recursion.)
      • wait 5 sec.
      • check status, wait until SUBMISSION_SUCCESS or SUBMISSION_FAILURE.  
      • if SUBMISSION_FAILURE:  
        • log it 
        • transform the TruSTAR Report object to a json string
        • dump string to file for human review later.
          • (bucket storage ideal, so an unattended script doesn’t fill up a host's hard drive)
          • consecutive_failures += 1 
          • if consecutive_failures >= NUMBER_OF_CONSECUTIVE_PROCESSING_FAILURES_THAT_YOU_THINK_WARRANTS_HUMAN_INTERVENTION:   
            • send your ENG tm a Slack msg.
            • automatically create Jira ticket for ENG tm.
            • raise Exception, terminate the process.     

TruSTAR Report External ID Format

Your integration needs to provide an External ID that uniquely identifies itself to TruSTAR. It is used to enable updating of TruSTAR reports between your app and the TruSTAR platform.

Formatting Rules

The External ID must contain only characters valid in URL strings. This is because some TruSTAR API endpoint calls will use the External ID as an URL string parameter.

Recommended algorithm:

  • Concatenate strings: observable value + destination enclave ID
  • base-64 encode the result of the concatenation (all resulting characters are valid in URLs)

TruSTAR Report Body Enrichment dict format

{ subject_observable:  { type: __________, 
    value: _________,  
attribute_one:  _______, 
a_second_attribute: ______, 
a_third_attribute:  ______, 
….etc….                  },
 
   related_observables:  [ { type: _____________,  
value:  ____________,
……etc…..               },  
{ type: __________,  
 value: _________, 
some_attribute:  _______,  
another_attribute: ______    } ]  }


How Did We Do?