top of page
Search

Unlocking the Power of GeoLite2 Free Geolocation Data with Github Actions and Azure IP Groups

  • Writer: Shannon Ford
    Shannon Ford
  • Jan 3, 2024
  • 7 min read

We had a need recently for grouping all of a country's known IP blocks so we could use them for some specific Azure IP Groups for whitelisting.


Enter Maxmind's GeoLite2 Free Geolocation Data Service (https://dev.maxmind.com/geoip/geolite2-free-geolocation-data)


Maxmind gathers up and groups all of the country ips in geo blocks and updates those lists every Tuesday and Friday. All you need is a Maxmind account and a license key (all free).


This is how Maxmind's data comes in the .zip file. We'll need to do some adjusting to meet our needs. Our team was specifically after the data in the GeoLite2-Country-Locations-en.csv and GeoLite2-Country-Blocks-IPv4.csv files, but you could certainly modify the below code to work with any or all of these files as needed.


GeoLite2-Country-CSV_20231031.zip contents:

ree


Let's Get The Geo Data By Country Gathered Up And Output To .txt Files So Terraform Can Consume It Easily


So I wrote some PowerShell to use their service and download the current database every Wednesday morning and unzip it into a GitHub repo, so that we massage the lists into individual countries and name the files in a fashion where they have the short country name and the full country name as part of the file names for use in other pipelines.


What the maxmind.ps1 PowerShell script does step by step:

  1. Downloads the current Maxmind GeoLite2 Free Database using our license key. The license key is stored as a GitHub Actions Secret and then used during the pipeline to set as a PowerShell Environment variable

  2. Runs Get-ChildItem with a filter of *.zip to get the current zip from Maxmind as a file object with its name/path. Output name of file to GitHub Output so we can use it for labeling the update to the repo later.

  3. Checks to make sure the .zip file from Maxmind exists, then before unzipping it, confirms that the ./$env:STR_IP_LIST_FOLDER_NAME exists AND is empty before unzipping anything. The $env:STR_IP_LIST_FOLDER_NAME is a GitHub Action variable that holds the folder you want to unzip the files to and this allows it to be changed/set from the GitHub Action pipeline easily with no code changes. We delete any previous country.txt files before beginning as well to make sure there is no lingering data from previous runs.

  4. Unzip each file we're interested in - GeoLite2-Country-Locations-en.csv and GeoLite2-Country-Blocks-IPv4.csv (From GitHub Action variable FILES_TO_EXTRACT - see screenshot) That way we can change the file names down the road - no code changes needed - yay!)

  5. Once those files are extracted - I used this article to group IP Blocks by country (https://www.spjeff.com/2015/04/21/datatable-in-powershell-for-crazy-fast-filters) using a datatable in Powershell. It's CRAZY fast.

  6. Then after the datatable is loaded from the CSV files, we output each country to a file named like IN_India.txt. That way we can split the name in other pipelines and use the iso country name for work in code but use the country name (ex: India) to do a regex to find the file with India we need to import based on country name - sweet!

  7. Cleanup of all un-needed files before we commit back to the repo.


GitHub Action Variable:


ree

maxmind.ps1

#Add Zip Assembly
Add-Type -AssemblyName System.IO.Compression.FileSystem

#Download File
$strcurlURL = 'https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-Country-CSV&license_key={0}&suffix=zip' -f $env:MAXMIND_LICENSE_KEY
curl -J -O $strcurlURL 

#Get Name and Path Of Zip Downloaded From MaxMind
$objZipFile = Get-ChildItem -Filter '*.zip'

#Output Current Maxmind Zip File Name To Github Output
$objGithub_Output = "mmzipname={0}" -f $objZipFile.name
echo $objGithub_Output >> $env:GITHUB_OUTPUT

#Unzip Files
If (Test-Path -Path $objZipFile.FullName) {
    #Make Sure Final Folder For <country>.txt Files Exists And Is Empty
    If (!(Test-Path -Path "./$env:STR_IP_LIST_FOLDER_NAME")) {
        Write-Output "Creating Folder For First Run"
        New-Item -Path "./$env:STR_IP_LIST_FOLDER_NAME" -ItemType Directory
    }
    Else {
        Write-Output "Removed Previous Run .txt Files"
        Remove-Item -Path "./$env:STR_IP_LIST_FOLDER_NAME" -Recurse
        New-Item -Path "./$env:STR_IP_LIST_FOLDER_NAME" -ItemType Directory
    }    

    #Extract Each File Defined In The Pipeline Variable
    $env:FILES_TO_EXTRACT -split (',') | ForEach-Object {
        Write-Output "Extracting $_ From maxmind.zip"

        #Get Contents of the Zip File
        $ZipContents = [System.IO.Compression.ZipFile]::OpenRead($objZipFile.FullName) 
        foreach ($entry in $ZipContents.Entries) {
            If ($entry.Name -eq $_) { $entryname = $entry.FullName }
        }
    
        #Get the specific entry (file) from the zip archive
        $SpecificFile = $ZipContents.GetEntry($entryName)
           
        #Get Contents of the File
        $inputStream = $SpecificFile.Open()
        $outputStream = [System.IO.File]::Create("./$env:STR_IP_LIST_FOLDER_NAME/$_")
         
        #Copy the content from the input stream to the output stream
        $inputStream.CopyTo($outputStream)
                
        $inputStream.Close()
        $outputStream.Close()
        $ZipContents.Dispose()
    }

    #Define the DataTable Columns In A Datatable (https://www.spjeff.com/2015/04/21/datatable-in-powershell-for-crazy-fast-filters/)
    Write-Output "Creating Datatable From CSV File"
    $ipv4table = New-Object system.Data.DataTable 'CountryBlocksIPv4'  
    @("network", "geoname_id") | ForEach-Object {
        $ipv4table.columns.add($(New-Object system.Data.DataColumn $_, ([string])))
    }

    #Load Datatable With CSV Info
    $objCountryIPv4Blocks = Import-Csv -Path "./$env:STR_IP_LIST_FOLDER_NAME/GeoLite2-Country-Blocks-IPv4.csv"
    $objCountryIPv4Blocks | ForEach-Object { 
        $row = $ipv4table.NewRow()  
        $row.network = $_.network
        $row.geoname_id = $_.geoname_id
        $ipv4table.Rows.Add($row)
    }

    #Write Out Each Country To It's Own Text File
    $objCountryLocs = Import-Csv -Path "./$env:STR_IP_LIST_FOLDER_NAME/GeoLite2-Country-Locations-en.csv"
    $objCountryLocs | ForEach-Object {
        if ($_.country_name) {
            Write-Output "Exporting Networks For $($_.country_name)"
            $strFilePath = "./$env:STR_IP_LIST_FOLDER_NAME/{0}_{1}.txt" -f $_.country_iso_code, $_.country_name
            $ipv4table.Select("geoname_id='{0}'" -f $_.geoname_id) | Select-Object -ExpandProperty network | Add-Content -Path $strFilePath 
        }
    }

    #Cleanup UnNeeded Files
    $env:FILES_TO_EXTRACT -split (',') | ForEach-Object {
        Remove-Item -Path "./$env:STR_IP_LIST_FOLDER_NAME/$_"
    }
    $objZipFile.FullName | Remove-Item
}

Sample output of Maxmind.ps1 from Github Action:

Run ./maxmind.ps1
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100 3214k  100 3214k    0     0  8494k      0 --:--:-- --:--:-- --:--:-- 8525k
Removed Previous Run .txt Files

    Directory: D:\a\<repo name>\<repo name>

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d----            1/3/2024  4:03 AM                ip_lists_by_country
Extracting GeoLite2-Country-Locations-en.csv From maxmind.zip
Extracting GeoLite2-Country-Blocks-IPv4.csv From maxmind.zip
Creating Datatable From CSV File
Exporting Networks For Rwanda
Exporting Networks For Somalia
Exporting Networks For Yemen
Exporting Networks For Iraq
Exporting Networks For Saudi Arabia
Exporting Networks For Iran
Exporting Networks For Cyprus
Exporting Networks For Tanzania
Exporting Networks For Syria
Exporting Networks For Armenia
Exporting Networks For Kenya
Exporting Networks For DR Congo
Exporting Networks For Djibouti

The best part is, we're doing all of this sorting and grouping of well over 1 million records in about 2 min total. Fast!!


Github Action Yaml:


Name: Process Maxmind GeoLite2 Country CSV Information

on:
  workflow_dispatch:
  schedule:
    #Every Wed at 4 am  
    - cron: "0 4 * * 3"
    
jobs:
  maxmind:
    name: Update Country CSV WhiteLists    
    runs-on: windows-latest
    env:
      MAXMIND_LICENSE_KEY: ${{secrets.MAXMIND_LICENSE_KEY}}
      FILES_TO_EXTRACT: ${{vars.FILES_TO_EXTRACT}}
      STR_IP_LIST_FOLDER_NAME: ${{vars.STR_IP_LIST_FOLDER_NAME}}
      SLACK_WEBHOOK_URL: ${{secrets.SLACK_WEBHOOK_URL}}
      SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK
      
    steps:     
      - name: Checkout Repository
        uses: actions/checkout@v3

      - name: 'Download And Process Maxmind CSV Zip File'
        id: posh
        shell: pwsh
        run: ./maxmind.ps1

      - name: Add Files and Push To Repo
        run: |
          git config user.name github-actions
          git config user.email github-actions@github.com
          git add .
          git commit -m "Updated Repo With Current Geolite2 Data: ${{ steps.posh.outputs.mmzipname }}"
          git push -u origin main

      - name: Notify Of Maxmind Update
        id: slack
        uses: slackapi/slack-github-action@v1.24.0
        with:
          payload: |
            {
              "text": "Maxmind Ipv4 Whitelists Per Country Updated",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "Updated Github Repository: *${{github.event.repository.name}}* \n\nUsing Maxmind File: *${{ steps.posh.outputs.mmzipname }}*"
                  }
                }
              ]
            }          


Note the git commit -m "Updated Repo With Current Geolite2 Data: ${{ steps.posh.outputs.mmzipname }}" That allows us to know which dated file from Maxmind was used for the IPs. Sure is a LOT of IP changes from one week to the next!


ree


Time To Terraform Azure IP Groups!!


The key to this Terraform is being able to hand in the names of the countries you want to whitelist like this as a GitHub Action Variable :

COUNTRY_FILES_TO_WHITELIST=United States,India


So the pipeline will pass in the Comma Delimited Countries to whitelist as a variable for Terraform. Then Terraform will generate IP lists such as the following in my case:

{Country = [list of ALL ips for that Country],Country=[list of ALL ips for that Country] ...}

{IN=[ip1,ip2,ip3...],US=[ip1,ip2,ip3...]}


Now Azure IP Groups only accept up to 5000 entries per IP Group, so I needed to chunk them. I chose 4900 per IP Group since that leaves 100 for manual adding or something along those lines.

{CountryCode_IP_GROUP_Number = {value = [4900 IPs]} etc - example below

{ IN_IP_GROUP_1 = { value = ["1.6.0.0/15","1.10.10.0/24",...]{US_IP_GROUP_1 = ["1.6.0.0/15","1.10.10.0/24",...]}}


terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = ">= 3.75.0"
    }
  }

  backend "azurerm" {
    resource_group_name  = <your resource group name here>
    storage_account_name = <your storage account name here>
    container_name       = <your container name here>
    use_oidc             = true
  }
}

# Configure the Microsoft Azure Provider
provider "azurerm" {
  features {}
}

locals {
  #Split List Of Countries (Github Variable - comma separated)
  scountries = split(",", var.countries)

  #Load Up The Ips From Each Country List - each country file comes from the veradigm-techops/network-security-authorized-geo-ips repo - files named as abbr_country.txt (ex: IN_India.txt) etc
  #ipsbycountry ends up looking like {IN=[ip1,ip2,ip3...],US=[ip1,ip2,ip3...]}
  ipsbycountry = { for c in local.scountries : element(split("_", element(split("/", one(fileset(path.module, "/maxmind/ip_lists_by_country/*_${c}.txt"))), 2)), 0) => split("\n", trimspace(file(one(fileset(path.module, "/maxmind/ip_lists_by_country/*_${c}.txt"))))) }

  #Chunk the List of IPS per country into 4900 ips per IP_Group
  #ip groups ends up looking like { IN_IP_GROUP_1 = { value = ["1.6.0.0/15","1.10.10.0/24",...]{US_IP_GROUP_1 = ["1.6.0.0/15","1.10.10.0/24",...]}}
  ipgroups = merge([for parentKey, parentValue in local.ipsbycountry : {
    for index, cl in chunklist(parentValue, 4900) : "${parentKey}_IP_GROUP_${index + 1}" => {
      iplist = cl
    }
    }
  ]...)
}

#Get Resource Group Where Work Is Occurring
data "azurerm_resource_group" "ipgroupsrg" {
  name = <your resource group name here>
}

#Create The IP_Groups As Designed By The Map Created In locals
resource "azurerm_ip_group" "ipgroup" {
  for_each            = local.ipgroups
  name                = each.key
  location            = data.azurerm_resource_group.ipgroupsrg.location
  resource_group_name = data.azurerm_resource_group.ipgroupsrg.name
  cidrs               = each.value.iplist
  timeouts {
    create = "4h"
    update = "4h"
  }
}

Github Actions YAML:

name: Update Ip Groups

on:
  workflow_dispatch:
    inputs:
      tfaction:
        type: choice
        description: Terraform Action
        options: 
          - apply
          - destroy
          - plan

permissions:
    id-token: write
    contents: read

jobs:
  terraform:
    name: Create/Update or Destroy IP Groups    
    runs-on: ubuntu-latest
    
    env:
      ARM_CLIENT_ID: ${{secrets.AZ_MI_CLIENT_ID}}
      ARM_TENANT_ID: ${{secrets.AZ_MI_TENANT_ID}}
      ARM_SUBSCRIPTION_ID:  ${{secrets.AZ_MI_SUBSCRIPTION_ID}}
      TF_VAR_countries: ${{vars.COUNTRY_FILES_TO_WHITELIST}}
      SLACK_WEBHOOK_URL: ${{secrets.SLACK_WEBHOOK_URL}}
      SLACK_WEBHOOK_TYPE: INCOMING_WEBHOOK            

    steps:     
      - name: Checkout Repository
        uses: actions/checkout@v4
        
      - name: Checkout Maxmind Country Lists Repo
        uses: actions/checkout@v4
        with:
          repository: <repo path to Maxmind repo to pull down country text files from>
          ssh-key: ${{ secrets.GH_DEPLOYMENT_KEY }}
          path: maxmind
      
      - name: 'Login To Azure'
        uses: azure/login@v1
        with:
          client-id: ${{secrets.AZ_MI_CLIENT_ID}}
          tenant-id: ${{secrets.AZ_MI_TENANT_ID}}
          subscription-id: ${{secrets.AZ_MI_SUBSCRIPTION_ID}}
  
      - name: Setup `terraform`
        uses: hashicorp/setup-terraform@v2
    
      - name: Terraform Init
        id: init
        run: "terraform init -input=false -backend-config=key=yourtfstatefile.terraform.tfstate"
      
      - name: Terraform Validate
        id: validate
        run: "terraform validate -no-color"
      
      - name: Terraform PLAN ONLY
        if: "${{ github.event.inputs.tfaction == 'plan' }}"
        id: plan
        run: "terraform plan"      

      - name: Terraform Apply
        if: "${{ github.event.inputs.tfaction == 'apply' }}"
        id: apply
        run: "terraform apply -auto-approve"
  
      - name: Terraform Destroy
        if: "${{ github.event.inputs.tfaction == 'destroy' }}"
        id: destroy
        run: "terraform destroy -auto-approve"

      - name: Notify Of IP_Groups Updated
        if: "${{ github.event.inputs.tfaction == 'apply' }}"
        id: slack
        uses: slackapi/slack-github-action@v1.24.0
        with:
          payload: |
            {
              "text": " Whitelisted IP_Groups Updated",
              "blocks": [
                {
                  "type": "section",
                  "text": {
                    "type": "mrkdwn",
                    "text": "The ${{github.event.repository.name}} scheduled pipeline run has updated the IP Groups"
                  }
                }
              ]
            }

When all is completed, we'll have <country_ip_group_index> with 4900 ip entries per IP Group like this:


IN_IP_Group_1 = [192.168.1.1 .. next 4900 ips]

IN_IP_Group_2 = [192.168.2.1 .. next 4900 ips]


US_IP_Group_1 = [192.168.1.1 .. next 4900 ips]

US_IP_Group_2 = [192.168.2.1 .. next 4900 ips]


And so on. So as the Maxmind database gets updated every Wednesday by schedule, we can now schedule this GitHub Action pipeline to then update our Azure IP Groups via Terraform using any new data from Maxmind and we'll always have up to date US and IN Geo Ip Whitelists.





 
 
 

Comments


bottom of page