STFC SOC Technical Meeting

Europe/London

 Tuesday 16 November 2021à 9:30 – 10:30

 

STFC SOC technical Meeting Minutes

Attending

[David Crooks - DC]                     [Jonathan Churchill - JC]

[Greg Corbett - GC]                      [Alastair Dewhurst - AD]

[Anish Mudaraddi - AM]             [Ian Colier - IC]                                                                                      

[Olivier Restuccia - OR]              [James Adams]

Announcements

Weekly meeting with a rotating set of people depending on needs

Agenda for today: Updates on: physical aspect & network layout, procurement, deployment and AOB

Discussion

Physical aspect and network layout

DC: still in same position as far as the rack 245 is concerned, Cristina has requested help clearing it, will talk to AD and Martin Bly

Logical and physical diagrams are completed, will be circulated

Deployment

Opensearch:

AM: the IAM issue is resolved (issue with IAM was the aquilon config not taking and being overwritten, key was changed but then overwritten), test admin group set up and need to talk about how to structure the grouping and what permissions people should have

DC: suggest that we want to put in a grouping now, but is something we need to give extended thought to as we proceed. Should define security group in IAM and have sub groups for different things (though this is out of scope for current status of deployment)

AM: now finishing the aquilon config for it, have set up cluster and will be done soon

GC: proceeding on assumption that SOC stuff will be its own archetype and we will move this config to this shared archetype when the moment comes

DC: cluster deployment and virtual cluster, still need to combine the existing cloud projects, then GC, OR and I will set up. Task will be to work out what we need, how to deploy the virtual cluster. Could OR set up a 30 minutes meeting with myself and GC for discussing this.

After discussion with James Adams, will use new archetypye: secops. Will have restricted set of admins, firewalls, selinux turned off by default (with option to turn on), goal is for this to be useful for SOC and other groups, things tested in SOC deployment may find their way into other archetypes over time

Zeek development:

DC: looking at kafka after cloud consolidation

OR: perf monitoring dashboard setup, still need to add a few custom metrics, started looking at zeek logs to get familiar and see what data is available

DC: during zeek week, at ESnet they’re testing new network driver to break up traffic which is much quicker than af_packet, called dpdk, to split networking across all computer cores, and less config required

Procurement

DC: from Friday, AD has checked in with budget controller with estimate of 254K, which is on budget

Plan was dual 100GB for data ingest 25Gbit for internal rack network and 1Gbit for firewall and outputs. 25Gbit cards have longer lead-time so will get 4 100Gbit ports, which may be overkill but should mean we never have network speed issues

AD: From DELL, Connect-X6 cards do the letter at the end matter?

JA: Yes, we want dx (the newest one), en (the standard one), will send a list of acceptable hardware

AD: lots of progress, we have a full part list, with reasonably accurate cost list, for the 25Gig cards there was a more than 65 day lead-time, some of the 100Gig cards are in stock. In terms of memory, may end up buying 2 machines with lots of memory and take half out and put into the old machines.

So now zeek nodes will have 2 100Gbit cards instead of  25Gbit and  100 Gbit and use a 100Gbit port instead of the splitter

DC: sounds good, could use the ports or also use splitters anyway to have the ports available

AD: do we have any information on PGs costs?

DC: not yet, will get in touch, need to setup another project meeting with AD, Paul and others.

The 254K, is that based on previous calculations or including recent discussions with DELL?

AD: includes recent discussions with DELL, have multiple conversations ongoing with DELL, not just SOC orders, would be good to try combine as much as possible

AD: Big Tier1 order is due in beginning of December so if need be could borrow some nodes from that

DC: need to put in a date to set up the first set of nodes in the rack, and dispose of existing hardware

AD: need to disucss this urgently with Martin Bly to get it added to the list. Need a hand-off from JC.

Rack currently doesn’t have UPS power feed, about £1000 to get a UPS power feed to it, would protect rack from a 30s power glitch, will still turn off after 10-15 minutes (after room get’s too hot) if there were a power outage

Actions

 DC: set up task list to work through

DC: combine existing cloud projects

DC: set up general project meeting with AD, Paul and others

DC: send email to Cristina, Martin Bly, JC and AD about rack 245 handover and addition of UPS power

OR: set up meeting with GC and DC to discuss the SOC virtual cloud setup

OR: dig into dpdk for splitting packets across multiple cores (also to talk to Jouker about)

 

There are minutes attached to this event. Show them.
    • 1
      General matters
      • Review actions
      • Urgent updates
      • Agree agenda
    • 2
      Hardware/network update
    • 3
      Zeek update
    • 4
      Elasticsearch update
    • 5
      Other updates/AOB