[2023 kcdc] DRYing out your GItLab Pipeline

Speaker: Lynn Owens

For more, see the table of contents.


Intro/Problem

  • Every gitlab project has own .gitlab-ci.yml file. Great for getting started
  • Quickly have hundreds of projects
  • Goal is to eliminate copy/paste by centralizing in a few projects

What NAIC has

  • 200+ projects maintained by 11 teams in 2 dev orgs
  • Pipeline is inner source
  • Version 6 of pipeline; working on version 7
  • Reduced maintenance burden by making change once and not in each project
  • Hosted directly on gitlab.com

Milestone 1 – Hidden jobs for pipeline project

  • GitLab has “hidden” jobs
  • Start with a period
  • Don’t appear in any pipeline; just for the common code
  • The “pipeline” project has a .gitlab-ci-base.yml which contains common code
  • Common code makes no assumptions about teams and is configurable for all known use cases
  • v1 was about two dozen lines of common code
  • The client projects include the pipeline code (can include in any part of gitlab so doesn’t need to be yours)
include:
   -project: 'NAIC/pipeline' 
   -file './gitlab-ci-base.yml'
  • Then added jobs that extended the hidden jobs to call functions in the base code. Where deploy_foo is in the base code
deploy_foo:
  stage: deploy
  extend: .deploy_s3
  variables:
   ...

Suggested practices

  • Advises against pinning the pipeline to a tag because don’t get bug fixes and everyone has to upgrade manually
  • Don’t include stages in the pipeline as it forces one opinion on everyone. Many groups had written a pipeline for their use case and not all same.

Milestone 2 – Profiles

  • Found a half dozen use cases. ex: Maven for Java, NPM building Angular etc.
  • The .gitlab-ci.yaml was a copy/paste of the others in the use case.
  • Made profiles/maven-java.yml and the like in the common profile
  • Profiles are not one size fits all because there are a bunch of different ones and can still use the milestone v1 approach.

Milestone 3 – Pipeline scripts

  • Common code like logging, calling rest apis, etc
  • Switched from bash scripting to python so had common code in modules and could unit test the modules

Options to get scripts

  • Could have the pipeline create a tar.zip and upload to a repo. This is a little slow
  • Could have a global before_script that does a git clone of peipleine-scripts. Uses a network connection
  • Could bake the scripts into an image. Requires a pipeline

If was doing again, wouldn’t create separate pipeline-scripts because tightly coupled to pipeline. Doesn’t change problem of using the scripts though.

Testing

  • If client projects are all using the default branch, small changes will affect them all.
  • Use a testing framework for script code (ex: python/go)
  • Follow development practices
  • Write a sample app for each profile. Have the common pipeline trigger a downstream pipeline on this project. For any merge to master, the downstream jobs must pass.
  • Before major refactors, inventory profile jobs and audit afterwards,

Milestone 4 – Profile Fragments

  • Had about 24 profiles (ex: maven-java-jar, maven-java-pom, maven-java-k8s, etc)
  • Typically three components – build tool, language, deployment method
  • These profiles had a lot of copy/paste
  • Decomposed into fragments – ex: maven, npm, java, angular, k8s, s3)

Selling the idea

  • Needed to convince people to use this pipeline instead of writing own or another team.
  • Offer flexibility
  • Show value
  • Follow semantic versioning to the T (he tags every merge to master of the pipeline even though encourages use of the default branch. the tags are good rollback points or if the project needs something older)
  • Changelog everything
  • Document well
  • Train and evangelize
  • Record training so have library

My take

This was a good case study and useful to see concrete examples and techniques. I wish we could see the code, but I understand that belongs to their org.

[2023 kcdc] cve 101: the unfolding of a zero day attack

Speaker: Theresa Mammarella

Twitter: @t_mammarella

For more, see the table of contents.


Notes

  • Annual cost of cyber crime predicting to top 8 trillion. Only US and China have more than that as GDP

Terminology

  • Vulnerability – weakness/flaw in system
  • Threat – attack vector, potential action
  • Risk – probably frequency of that loss.
  • Goal of cybersecurity is to minimize risk. Can’t control intent to do harm so focus on vunlerability

CVEs

  • CVE – Common Vulnerabilities and Exposures
  • Format CVE-xxxx-yyyyy. xxxx = year came out. yyyy = identifier
  • CVSS scoring – how bad is it on a scale of 0-10. Ten is worst
  • CVSS score has three parts – basic (exploitability, impact), temporal, environmental. Good description here
  • Basic is the one we see on the CVE
  • CVE can be rejected. The number is used and cannot be reused. Example. Something thought found a vulnerability. Investigation was flawed and not an actual issue. Story about it here.

How to talk about

  • Private disclosure – organization can choose when/whether to fix/share
  • Coordinated/responsible disclosure – best practice – agreed upon time frame
  • Full/public disclosure – share everything
  • Best to report via company website, security.md file, security files on server, github private vulnerability reporting

Zero day vulnerability

Examples

  • log4jshell – remote code loading. Was reported responsibility but incomplete fix so zero days on those CVEs
  • Could be as simple as a bounds check. For OpenSSL. Announced something big coming and get ready. When announced learned it only affected OpenSSL 3 (not 2) and high, not critical so boy who cried wolf situation.

Security Practices for Developers

  • Insider threat includes poor training
  • A lot more developers than info security. Increasingly harder for security teams to keep up.
  • Cost of finding and fixing bugs increases over time
  • Does this touch the internet? take untrusted input/ handle sensitive data?
  • OWASP Top 10. Updated in 2021 to add insecure design, software/data integrity failures and server side request forgery (SSRF). Some merged such as injection.
  • Starting OWASP Top 10 for Large Language Model Applications. A draft version is available
  • mitre/hipcheck – scorecard for supply chain risk. Similarly, Sonatype security rating and OpenSSF Scorecard
  • Open source dependency management. Embedded in many projects. 90% of app is open source on average. North Korea attacked many apps including Putty

Attack types

  • Typosquatting – look alike domain with one or two wrong characters
  • Open source repo attackes – attempt to get maleware/weakness added into depednecy source
  • Build tool attacks
  • Dependency confusion – different version that shows up as latest

Trust?

  • Sometimes third party projects. ex: OpenSSF Scorecard
  • NPM and PyPI often have supply chain attacks. Maven Central more so
  • Scanning tools to find issues can be helpful
  • You are responsible when things go wrong

My take

Good talk. Covered concepts and good real life examples. I learned a few things like the OWASP Top 10 for LLMs. Appreciated the shout out to “the Java people in the front row” when talking about log4j. I added a few links in my blog that weren’t in the original presentation for things I wanted to learn more about.

[2023 kcdc] data leakage – why your ML model knows too much

Speaker: Leah Berg

For more, see the table of contents.


Notes

Data Leakage

  • Also known as leakage or target leakage
  • Different meaning for information security (data leaking to outside organization)
  • Can be difficult to spot
  • Training data includes info about test.
  • Model trained on info not available in production

How models learn

  • Split data into training data and test data.
  • Test data – data model has never seen before and makes sure model gets is right
  • Can also have an optional validation set
  • Randomly pick whether data points are training or test data. – Called random train/test split
  • More training data than test data

Don’t include data from the future

  • Using a random split of time series data doesn’t work because model has learned about future data.
  • Better to use a sliding window. Use first few months to predict next month. Then add that next value and predict one after. And keep going. Adding up error gives you accuracy of model.
  • This works because model only knows about data before one asked to predict.
  • Create timeline for when events happen. That way you make sure you aren’t using data from before the prediction
  • Don’t always know where/when data was created. Important to understand business process

Don’t randomly split groups

  • Have some data from the group you are then predicting
  • Problem when new student shows up so prediction will be bad
  • scikit-learn has GroupShuffleSplit() to get full group in same set – testing or training

Don’t forget your data is a snapshot

  • In school, have pristine data set.
  • In real world, data is always changing.
  • Could tell model about data that occurred after prediction. Again think about data on timeline

Don’t randomly split data when retraining

  • Want to use same training/test data on production and challenger models to see which better.
  • One has already seen data points during training that you are testing so you don’t know if it is better.
  • Challenger model can get more data that wasn’t available originally. Ok to split new data into test/train as long as original data part is split same way.

Split data immediately

  • Risky to rescale before split because data isn’t represented same way. Min/max can vary if split after
  • Run normalization on different sets of data
  • Before split, do analysis with business, exploratory data analysis. Split data before start modeling

Use Cross Validation

  • KFold Validation – split training data into K parts
  • ex: 3 fold validation – two parts stay as training and one is validation. The test data remains as test data and is kept separate for final evaluation.
  • The validation set is for an initial test.
  • Gives more options to train model

Be Skeptical of High Performance

  • If validation much higher than train/test, suspicious.
  • If train/test/validation sets are all high/the same, suspicious.

Use scikit-learn pipeline

  • Helps avoid leaking test data into training data

Check for features correlated with target

  • If another attribute has a high match with what looking for, make sure not mixing up correlation/causation.
  • Also, avoid timeline errors for reverse causation. Ex: the thing you are looking for causes, something else

My take

Great talk. Almost all of this was new to me. It was understandable and I learned a lot.