Portal:DeveloperDocs/nftables internals
This page contains information for Netfilter developers on how nftables internals work.
The kernel subsystem
The nf_tables kernel subsystem contains 2 key components:
- the netlink API (i.e, control plane API)
- the nf_tables core (i.e, the data plane engine)
Other components, such as external modules, are also in place and are intermixed with both the API and the core.
Generally speaking, the nf_tables subsystem is implementing a virtual machine of low-level expressions that operates on network packets.
TODO: add info.
nf_tables netlink API
The source code is mostly in net/netfilter/nf_tables_api.c [elixir src] [git src]
TODO: add info.
nf_tables core
The source code is mostly in net/netfilter/nf_tables_core.c [elixir src] [git src]
You can see there one of the most important functions in the core: nft_do_chain(). In a nut shell, this is the function that evaluates network packets against the ruleset.
The logic in this function is rather simple:
- for each rule in the chain
- for each low level expression in the rule
- evaluate the packet against the expression
- evaluate expression return code (break, continue, drop, accept, jump, goto, etc)
- for each low level expression in the rule
TODO: add info.
expressions
There are many low expressions that allows us to operate over network packets in different ways. You can think on these low level expressions as assembly-like instructions.
- nft_immediate: loads an immediate value into a register.
- nft_cmp: compare a given data with data from a given register.
- nft_payload: set/get arbitrary data from packet headers.
- nft_bitwise: perform bit-wise math operations over data in a given register.
- nft_byteorder: perform byte order operations over data in a given register.
- nft_counter: a basic counter for packet/bytes that gets incremented everything is evaluated for a packet.
- nft_meta: set/get packet meta information, such as related interfaces, timestamps, etc.
- nft_lookup: search for data from a given register (key) into a dataset. If the set is a map/vmap, returns the value for that key.
TODO: add info.
The userspace components
There are several important components in the userpsace part of nftables:
- libmnl: generic low level library used to communicate with the kernel using netlink sockets.
- libnftnl: low level library that is capable of interacting with the nf_tables subsystem netlink API in the kernel. Is responsible for creating/parsing the nf_tables netlink messages. Uses libmnl under the hood.
- libnftables: high level library that implements the logic to translate from high level statements to netlink objects and the other way around. Uses libnftnl under the hood.
- nft: the command line interface binary. This is what most end users actually use in their systems. It reads user input and calls libnftables under the hood.
Generally speaking, the userspace compiles high level statements (rules, etc) into the netlink bytecode that the kernel API understands When inspecting the ruleset (i.e, listing it) what it does is the opposite, reconstruct the low level netlink bytecode into high level statements.
libnftnl
TODO: add info.
libnftables
TODO: add info.
nft: from userspace to the kernel
TODO: add info.
nft: from the kernel to the usespace
TODO: add info.