Portal:DeveloperDocs/nftables internals

From nftables wiki
Revision as of 17:47, 19 May 2022 by Phil (talk | contribs) (Added section about libnftables)
Jump to navigation Jump to search

This page contains information for Netfilter developers on how nftables internals work.

The kernel subsystem

The nf_tables kernel subsystem contains 2 key components:

  • the netlink API (i.e, control plane API)
  • the nf_tables core (i.e, the data plane engine)

Other components, such as external modules, are also in place and are intermixed with both the API and the core.

Generally speaking, the nf_tables subsystem is implementing a virtual machine of low-level expressions that operates on network packets.

TODO: add info.

nf_tables netlink API

The source code is mostly in net/netfilter/nf_tables_api.c [elixir src] [git src]

TODO: add info.

nf_tables core

The source code is mostly in net/netfilter/nf_tables_core.c [elixir src] [git src]

You can see there one of the most important functions in the core: nft_do_chain(). In a nut shell, this is the function that evaluates network packets against the ruleset.

The logic in this function is rather simple:

  • for each rule in the chain
    • for each low level expression in the rule
      • evaluate the packet against the expression
    • evaluate expression return code (break, continue, drop, accept, jump, goto, etc)

TODO: add info.

expressions

There are many low expressions that allows us to operate over network packets in different ways. You can think on these low level expressions as assembly-like instructions.

  • nft_immediate: loads an immediate value into a register.
  • nft_cmp: compare a given data with data from a given register.
  • nft_payload: set/get arbitrary data from packet headers.
  • nft_bitwise: perform bit-wise math operations over data in a given register.
  • nft_byteorder: perform byte order operations over data in a given register.
  • nft_counter: a basic counter for packet/bytes that gets incremented everything is evaluated for a packet.
  • nft_meta: set/get packet meta information, such as related interfaces, timestamps, etc.
  • nft_lookup: search for data from a given register (key) into a dataset. If the set is a map/vmap, returns the value for that key.

TODO: add info.

The userspace components

There are several important components in the userpsace part of nftables:

  • libmnl: generic low level library used to communicate with the kernel using netlink sockets.
  • libnftnl: low level library that is capable of interacting with the nf_tables subsystem netlink API in the kernel. Is responsible for creating/parsing the nf_tables netlink messages. Uses libmnl under the hood.
  • libnftables: high level library that implements the logic to translate from high level statements to netlink objects and the other way around. Uses libnftnl under the hood.
  • nft: the command line interface binary. This is what most end users actually use in their systems. It reads user input and calls libnftables under the hood.

Generally speaking, the userspace compiles high level statements (rules, etc) into the netlink bytecode that the kernel API understands When inspecting the ruleset (i.e, listing it) what it does is the opposite, reconstruct the low level netlink bytecode into high level statements.

libnftnl

This library provides data structures for entities existing in nf_tables nomenclature, such as tables, chains and rules. It serves as an intermediate layer between nftables and iptables-nft user space applications and nfnetlink messages the kernel sends and receives.

In general, each data structure comes with a set of handling routines:

allocators
To allocate and free an object of given type
setters/getters
Data structure fields are accessed via an attribute number (via a specific enum field)
serializers
Populating a netlink message or vice versa
printers
Providing a textual representation, mostly for debugging purposes

Where sensible, there is a list-variant, too. If so, it comes with handling routines as well:

allocators
Allocating and freeing the list object (and members)
populators
Add and remove from the list

Where useful, there might be a lookup routine as well. With nftnl_chain_list, e.g. the list object contains a hash table for chain names as well so list lookup by chain name is faster than a linear search.

A typical extra for list objects are iterators: A data structure containing state while browsing through the list. Usually the only routines used are allocators and a next routine.

These are the entities defined by libnftnl:

table
A rather boring "namespace" for chains
chain
A container for rules, may attach to a netfilter hook in kernel
rule
A container for expressions
expr
An nftables VM code instruction
flowtable
Similar to a chain, but holds flows between interfaces
obj
A generic object, typically holding stateful information
ruleset
A container for lists of tables, chains, sets and rules - not used by nftables application anymore
set
A container for elements
set_elem
A set element
trace
A trace event sent by the kernel

nftnl_expr

While nftables distinguishes between expressions and statements, such difference does not quite exist in libnftnl layer. For instance, a statement like:

ip saddr 192.168.0.1

is actually two expressions:

payload
loading IPv4 header's source address into a register
cmp
comparing data from a register against a stored value

Since expressions have access to the packet, its meta data, all nftables registers (including the verdict register) and may store multiple values internally, they are mighty and versatile.

nftnl_obj

This is a common API for various object types. An object's type is defined post allocation by setting the NFTNL_OBJ_TYPE attribute. Currently existing object types are:

  • counter
  • quota
  • ct helper
  • limit
  • tunnel
  • ct timeout
  • secmark
  • ct expect
  • synproxy

nftnl_batch

This is a wrapper interface around the same functionality in libmnl (which is used internally). In general, nftnl batches aid in collecting multiple netlink messages for kernel submission.

libnftables

One goal in nftables development was to provide users with a library for easier integration into applications than "shelling out" using system() and trying to parse nft command output.

At first, libnftnl was supposed to achieve this but the fact that it exposes internal implementation details apart from being pretty low-level in general made it rather unsuitable from a users' perspective.

To overcome this, nft backend code was separated into a library which should fill the gap between libnftnl on one side and nft application itself on the other.

Usage of libnftables is supposed to be simple and straightforward, almost like calling nft itself but with a bit more convenience. First step is to create a new context:

 struct nft_ctx *ctx = nft_ctx_new(0);

The context allows to configure library behaviour on a "per session" basis. With this in place, nftables commands may be executed:

 int rc = nft_run_cmd_from_buffer(ctx, "add table inet t");

or whole dump files loaded:

 int rc = nft_run_cmd_from_filename(ctx, "/etc/nftables/all-in-one.nft");

To control output, there are a number of functions:

 FILE *nft_ctx_set_output(struct nft_ctx *ctx, FILE *fp);
 int nft_ctx_buffer_output(struct nft_ctx *ctx);
 int nft_ctx_unbuffer_output(struct nft_ctx *ctx);
 const char *nft_ctx_get_output_buffer(struct nft_ctx *ctx);

Same for stderr. See libnftables(3) man page for further details.

nft: from userspace to the kernel

TODO: add info.

nft: from the kernel to the usespace

TODO: add info.