Portal:DeveloperDocs/nftables internals
This page contains information for Netfilter developers on how nftables internals work.
The kernel subsystem
The nf_tables kernel subsystem contains 2 key components:
- the netlink API (i.e, control plane API)
- the nf_tables core (i.e, the data plane engine)
Other components, such as external modules, are also in place and are intermixed with both the API and the core.
Generally speaking, the nf_tables subsystem is implementing a virtual machine of low-level expressions that operates on network packets.
TODO: add info.
nf_tables netlink API
The source code is mostly in net/netfilter/nf_tables_api.c [elixir src] [git src]
TODO: add info.
nf_tables core
The source code is mostly in net/netfilter/nf_tables_core.c [elixir src] [git src]
You can see there one of the most important functions in the core: nft_do_chain(). In a nut shell, this is the function that evaluates network packets against the ruleset.
The logic in this function is rather simple:
- for each rule in the chain
- for each low level expression in the rule
- evaluate the packet against the expression
- evaluate expression return code (break, continue, drop, accept, jump, goto, etc)
- for each low level expression in the rule
TODO: add info.
expressions
There are many low expressions that allows us to operate over network packets in different ways. You can think on these low level expressions as assembly-like instructions.
- nft_immediate: loads an immediate value into a register.
- nft_cmp: compare a given data with data from a given register.
- nft_payload: set/get arbitrary data from packet headers.
- nft_bitwise: perform bit-wise math operations over data in a given register.
- nft_byteorder: perform byte order operations over data in a given register.
- nft_counter: a basic counter for packet/bytes that gets incremented everything is evaluated for a packet.
- nft_meta: set/get packet meta information, such as related interfaces, timestamps, etc.
- nft_lookup: search for data from a given register (key) into a dataset. If the set is a map/vmap, returns the value for that key.
TODO: add info.
The userspace components
There are several important components in the userpsace part of nftables:
- libmnl: generic low level library used to communicate with the kernel using netlink sockets.
- libnftnl: low level library that is capable of interacting with the nf_tables subsystem netlink API in the kernel. Is responsible for creating/parsing the nf_tables netlink messages. Uses libmnl under the hood.
- libnftables: high level library that implements the logic to translate from high level statements to netlink objects and the other way around. Uses libnftnl under the hood.
- nft: the command line interface binary. This is what most end users actually use in their systems. It reads user input and calls libnftables under the hood.
Generally speaking, the userspace compiles high level statements (rules, etc) into the netlink bytecode that the kernel API understands When inspecting the ruleset (i.e, listing it) what it does is the opposite, reconstruct the low level netlink bytecode into high level statements.
libnftnl
This library provides data structures for entities existing in nf_tables nomenclature, such as tables, chains and rules. It serves as an intermediate layer between nftables and iptables-nft user space applications and nfnetlink messages the kernel sends and receives.
In general, each data structure comes with a set of handling routines:
- allocators
- To allocate and free an object of given type
- setters/getters
- Data structure fields are accessed via an attribute number (via a specific enum field)
- serializers
- Populating a netlink message or vice versa
- printers
- Providing a textual representation, mostly for debugging purposes
Where sensible, there is a list-variant, too. If so, it comes with handling routines as well:
- allocators
- Allocating and freeing the list object (and members)
- populators
- Add and remove from the list
Where useful, there might be a lookup routine as well. With nftnl_chain_list, e.g. the list object contains a hash table for chain names as well so list lookup by chain name is faster than a linear search.
A typical extra for list objects are iterators: A data structure containing state while browsing through the list. Usually the only routines used are allocators and a next routine.
These are the entities defined by libnftnl:
- table
- A rather boring "namespace" for chains
- chain
- A container for rules, may attach to a netfilter hook in kernel
- rule
- A container for expressions
- expr
- An nftables VM code instruction
- flowtable
- Similar to a chain, but holds flows between interfaces
- obj
- A generic object, typically holding stateful information
- ruleset
- A container for lists of tables, chains, sets and rules - not used by nftables application anymore
- set
- A container for elements
- set_elem
- A set element
- trace
- A trace event sent by the kernel
nftnl_expr
While nftables distinguishes between expressions and statements, such difference does not quite exist in libnftnl layer. For instance, a statement like:
ip saddr 192.168.0.1
is actually two expressions:
- payload
- loading IPv4 header's source address into a register
- cmp
- comparing data from a register against a stored value
Since expressions have access to the packet, its meta data, all nftables registers (including the verdict register) and may store multiple values internally, they are mighty and versatile.
nftnl_obj
This is a common API for various object types. An object's type is defined post allocation by setting the NFTNL_OBJ_TYPE attribute. Currently existing object types are:
- counter
- quota
- ct helper
- limit
- tunnel
- ct timeout
- secmark
- ct expect
- synproxy
nftnl_batch
This is a wrapper interface around the same functionality in libmnl (which is used internally). In general, nftnl batches aid in collecting multiple netlink messages for kernel submission.
libnftables
TODO: add info.
nft: from userspace to the kernel
TODO: add info.
nft: from the kernel to the usespace
TODO: add info.