AI Local LLM Engineer
An AI Local LLM Engineer specializes in deploying, optimizing, and maintaining large language models that run entirely on local or…
Skill Guide
The practice of tuning the Linux operating system's core kernel parameters, managing hardware drivers, and optimizing disk I/O subsystems to minimize latency and maximize throughput for loading large machine learning models into memory.
Scenario
You have a PyTorch/TF script that loads a 50GB model from a local NVMe SSD. The load time is inconsistent and seems slow.
Scenario
An inference service container on a K8s node experiences cold-start latency spikes when loading models from a shared network filesystem (NFS).
Scenario
Your team has a proprietary storage appliance with a custom protocol. The standard Linux block device driver adds 2ms of latency per I/O request due to unnecessary overhead.
Use `perf` for CPU/IPC analysis during load, `strace` for syscall tracing, `eBPF` tools like `biolatency` for deep I/O latency histograms, and `iostat`/`iotop` for real-time disk stats.
`sysctl` for runtime parameter changes, `tuned-adm` for applying predefined profiles (e.g., `throughput-performance`), `modprobe` for driver management, and `dracut` for rebuilding initramfs with needed modules.
`fio` for flexible I/O workload simulation, `hdparm` for low-level disk parameter tuning (CAUTION: risky), `bcache`/`dm-cache` for SSD caching of slower storage.
`io_uring` for async I/O, `O_DIRECT` to bypass page cache for large sequential reads, manage THP to reduce TLB misses (but can cause latency spikes), and use `cgroups` v2 for I/O resource control.
1 career found
Try a different search term.