To create a low-cost server capable of running specialized LLM models locally, suitable for academic work and operation in a dorm room.
The core of the server is a Raspberry Pi 5. Through the m.2 HAT+, m.2 to OcuLink adapter, and OcuLink eGPU board, the RPi5 is connected to a Radeon RX580 graphics card, which is powered by a standard computer power supply. All of these parts except the adapters were repurposed.
On the software side, the server runs llama.cpp, loading versions of models like IBM's Granite and Alibaba's Qwen Math quantized to fit on the 8GB VRAM on the graphics card. To mitigate the reduced performance of the quantized models, the user selects an appropriate specialized model for their task.
The back plate of the case is composed of two pieces for ease of 3D printing; a large plate for the power supply and graphics card, and a sub plate for the raspberry pi and it's associated power adapter. The shell of the case is designed to provide the least restriction to airflow possible while preserving structural integrity, allowing the graphics card to dissipate heat efficiently and ensuring optimal performance.
The server effectively
Remotely runs specialized models for completing various tasks (general knowledge, problem solving) with high uptime.
Occupies a small space, produces little noticeable noise, and draws very little power.
making it suitable for use in an academic environment (summarizing research, creating study materials, giving feedback on writing) while keeping all data local.