Description
When running structured/guided generation using a JSON Schema, the apple-fm-sdk appears to leak a native FMComposedPrompt pointer and its associated socket/XPC file descriptors on sequential respond() requests.
The Source of the Leak:
Inside apple_fm_sdk/session.py in the _respond_with_schema_from_json() method:
_composed_prompt_from_prompt(prompt) is called, which initializes a native composed prompt pointer via C-bindings:
composed_prompt = self._composed_prompt_from_prompt(prompt=prompt)
# Under the hood, this calls lib.FMComposedPromptInitialize()
- The session creates the guided generation
task pointer and submits it to the underlying compiled framework.
- In the
finally: cleanup block, the session releases the native task pointer (lib.FMRelease(task)), but the composed_prompt pointer is not released:
finally:
_unregister_handle(future_handle)
lib.FMRelease(task)
# Note: composed_prompt is not released
self._active_task = None
Impact & Observed Behavior
Because the native FMComposedPrompt structure is not released, sequential or iterative structured generation runs (such as batch processing pipelines) leak a native composed prompt structure and its underlying XPC/Mach connection handles per request.
Through testing, I observed that file descriptor leakage is caused by two overlapping sources:
- The Native Pointer Leak: The native
FMComposedPrompt pointer is leaked on the heap. Since it holds onto the visual ImageAttachment reference, it keeps the underlying image file descriptor open even if the LanguageModelSession python wrapper is completely destroyed and garbage-collected.
- The Transcript History Retain: The native
LanguageModelSession transcript history automatically retains previous prompts and attachments. Therefore, in a single persistent session run, previous attachment file descriptors are kept open throughout the session's lifetime.
To completely prevent these file descriptor leaks during sequential loops, both applying the monkey-patch and recreating the session seem to be required:
- To clear the transcript history, the session must be recreated.
- However, attempting to clear these channels by manually forcing the release of the native session resources (by calling the internal
session._release() method in a loop) leads to duplicate deallocation and double-free crashes (EXC_BREAKPOINT / SIGTRAP in libswiftCore.dylib) because Python's garbage collector automatically runs the session destructor __del__ which tries to release the raw _ptr again.
Under macOS, even though the soft file descriptor limit can be high (e.g., 1,048,575), sequential predictions consistently fail after exactly 240-250 sequential calls with image attachments. The system starts throwing a fatal OSError: [Errno 9] Bad file descriptor on any subsequent file system opens (including standard Python open(), PIL.Image.open(), or system plist reads).
While the user-facing header limit __DARWIN_FD_SETSIZE is defined as 1024, the consistent failure threshold at ~250 open files suggests one of several highly plausible system-level limits in Apple's closed-source background frameworks:
- launchd / Sandbox Concurrent XPC Connection Cap: The background system daemon (
/usr/libexec/macOSFoundationModels) communicating with Python via XPC is subject to strict concurrent connection caps. Standard sandboxing templates cap concurrent client channels at 256. Once the process leaks more than ~250 channels, launchd/sandbox rejects new handshakes, causing the socket descriptor to be invalid (EBADF).
- CoreFoundation
CFRunLoop Socket Registration Limit: Asynchronous socket events inside the framework are managed by CoreFoundation’s CFRunLoop. CFSocket/CFFileDescriptor registration is historically capped at 256 per thread runloop to prevent socket exhaustion.
- Internal C++ select() Thread Capping: Private Neural Engine worker threads managing dispatch often run private
select() worker loops over active generation tasks, using pre-allocated thread-local arrays capped at 256 slots.
Steps to Reproduce
I have created a standalone, single-file reproducer script (afm_leak_reproducer.py) with zero dependencies other than apple-fm-sdk and pillow to demonstrate the leak.
Reproducer Code (afm_leak_reproducer.py):
import argparse
import asyncio
import gc
import json
import os
import sys
import tempfile
from pathlib import Path
from typing import Any
import apple_fm_sdk as fm
from apple_fm_sdk.session import lib as fm_lib_module, _register_handle, _unregister_handle, _session_structured_callback
fm_lib: Any = fm_lib_module
# Minimal Schema representing an object
DUMMY_SCHEMA = {
"type": "object",
"properties": {"reply": {"type": "string"}},
"required": ["reply"],
"title": "ModelResponse",
"x-order": ["reply"],
"additionalProperties": False
}
def apply_monkey_patch():
"""Monkey-patch to explicitly release the composed prompt pointer."""
async def patched_respond_with_schema_from_json(self, prompt: Any, json_schema: dict, options: Any = None) -> Any:
async with self._request_lock:
loop = asyncio.get_running_loop()
future = loop.create_future()
composed_prompt = self._composed_prompt_from_prompt(prompt=prompt)
json_schema_bytes = json.dumps(json_schema).encode("utf-8")
options_json = None
if options is not None:
options_json = json.dumps(options.to_dict()).encode("utf-8")
future_handle = _register_handle(future)
task = fm_lib.FMLanguageModelSessionRespondWithSchemaFromJSON(
self._ptr, composed_prompt, json_schema_bytes, options_json, future_handle, _session_structured_callback
)
self._active_task = task
try:
await future
except Exception as e:
self._reset_task_state()
raise e
finally:
_unregister_handle(future_handle)
fm_lib.FMRelease(task) # type: ignore
# --- BUG FIX ---
if composed_prompt:
try:
fm_lib.FMRelease(composed_prompt) # type: ignore
except Exception:
pass
# -------------------------
self._active_task = None
return future.result()
fm.LanguageModelSession._respond_with_schema_from_json = patched_respond_with_schema_from_json # type: ignore
def get_open_fds_count() -> int:
return len(os.listdir("/dev/fd")) if os.path.exists("/dev/fd") else -1
async def run_single_prediction(session: fm.LanguageModelSession, i: int, dummy_image_path: Path) -> None:
prompt = f"Echo hello {i}"
attachment = fm.ImageAttachment(dummy_image_path)
full_prompt = [prompt, attachment]
_ = await session.respond(full_prompt, json_schema=DUMMY_SCHEMA) # type: ignore
async def main():
parser = argparse.ArgumentParser()
group = parser.add_mutually_exclusive_group(required=True)
group.add_argument("--unpatched", action="store_true")
group.add_argument("--unpatched-recreate", action="store_true")
group.add_argument("--patched", action="store_true")
group.add_argument("--patched-recreate", action="store_true")
parser.add_argument("--iterations", type=int, default=10)
args = parser.parse_args()
recreate_session = args.unpatched_recreate or args.patched_recreate
use_patch = args.patched or args.patched_recreate
if use_patch:
apply_monkey_patch()
from PIL import Image
temp_dir = Path(tempfile.gettempdir())
dummy_image_path = temp_dir / "afm_leak_dummy.jpg"
im = Image.new("RGB", (32, 32), "white")
im.save(dummy_image_path)
session = None
if not recreate_session:
session = fm.LanguageModelSession(instructions="Reply with short replies.")
for i in range(1, args.iterations + 1):
active_session = session
if recreate_session:
active_session = fm.LanguageModelSession(instructions="Reply with short replies.")
await run_single_prediction(active_session, i, dummy_image_path) # type: ignore
if recreate_session:
active_session = None
gc.collect()
if i % 20 == 0 or i == 1 or i == args.iterations:
print(f"Iteration {i}/{args.iterations} | Open File Descriptors: {get_open_fds_count()}")
if __name__ == "__main__":
asyncio.run(main())
Test Results Traces (10 Iterations)
Here is the captured console trace demonstrating the behavior across all four modes:
=== AFM LEAK RESULTS ===
--- MODE: --unpatched ---
[INFO] Running UNPATCHED leaky mode (single persistent session)...
Created temporary test image at: /var/folders/h5/b8q8m36565q68y5gjd9yqczr0000gn/T/afm_leak_dummy.jpg
Initial open File Descriptors: 7
Running guided text generation with ImageAttachment for 10 iterations...
Iteration 1/10 | Open File Descriptors: 8
Iteration 10/10 | Open File Descriptors: 17
[SUCCESS] Completed all iterations successfully!
Final open File Descriptors: 17
--- MODE: --unpatched-recreate ---
[INFO] Running UNPATCHED leaky mode (re-creating session on EACH iteration)...
Created temporary test image at: /var/folders/h5/b8q8m36565q68y5gjd9yqczr0000gn/T/afm_leak_dummy.jpg
Initial open File Descriptors: 7
Running guided text generation with ImageAttachment for 10 iterations...
Iteration 1/10 | Open File Descriptors: 8
Iteration 10/10 | Open File Descriptors: 17
[SUCCESS] Completed all iterations successfully!
Final open File Descriptors: 17
--- MODE: --patched ---
[INFO] Monkey-patching apple_fm_sdk.LanguageModelSession._respond_with_schema_from_json...
[INFO] Running PATCHED leak-free mode (single persistent session)...
Created temporary test image at: /var/folders/h5/b8q8m36565q68y5gjd9yqczr0000gn/T/afm_leak_dummy.jpg
Initial open File Descriptors: 7
Running guided text generation with ImageAttachment for 10 iterations...
Iteration 1/10 | Open File Descriptors: 8
Iteration 10/10 | Open File Descriptors: 17
[SUCCESS] Completed all iterations successfully!
Final open File Descriptors: 17
--- MODE: --patched-recreate ---
[INFO] Monkey-patching apple_fm_sdk.LanguageModelSession._respond_with_schema_from_json...
[INFO] Running PATCHED leak-free mode (re-creating session on EACH iteration)...
Created temporary test image at: /var/folders/h5/b8q8m36565q68y5gjd9yqczr0000gn/T/afm_leak_dummy.jpg
Initial open File Descriptors: 7
Running guided text generation with ImageAttachment for 10 iterations...
Iteration 1/10 | Open File Descriptors: 7
Iteration 10/10 | Open File Descriptors: 7
[SUCCESS] Completed all iterations successfully!
Final open File Descriptors: 7
- Observation: The only mode that achieves perfectly flat, constant file descriptor counts (
Final FDs: 7) is --patched-recreate, suggesting that both the monkey patch and session recreation are required to plug both leak channels.
Proposed Fix
In apple_fm_sdk/session.py, modifying the finally: block of _respond_with_schema_from_json (and any other generation methods that initialize an FMComposedPrompt pointer) to explicitly release composed_prompt seems to resolve the issue:
finally:
_unregister_handle(future_handle)
lib.FMRelease(task)
# --- FIX: Release the native composed prompt pointer ---
if composed_prompt:
try:
lib.FMRelease(composed_prompt)
except Exception:
pass
# -------------------------------------------------------
self._active_task = None
Description
When running structured/guided generation using a JSON Schema, the
apple-fm-sdkappears to leak a nativeFMComposedPromptpointer and its associated socket/XPC file descriptors on sequentialrespond()requests.The Source of the Leak:
Inside
apple_fm_sdk/session.pyin the_respond_with_schema_from_json()method:_composed_prompt_from_prompt(prompt)is called, which initializes a native composed prompt pointer via C-bindings:taskpointer and submits it to the underlying compiled framework.finally:cleanup block, the session releases the nativetaskpointer (lib.FMRelease(task)), but thecomposed_promptpointer is not released:Impact & Observed Behavior
Because the native
FMComposedPromptstructure is not released, sequential or iterative structured generation runs (such as batch processing pipelines) leak a native composed prompt structure and its underlying XPC/Mach connection handles per request.Through testing, I observed that file descriptor leakage is caused by two overlapping sources:
FMComposedPromptpointer is leaked on the heap. Since it holds onto the visualImageAttachmentreference, it keeps the underlying image file descriptor open even if theLanguageModelSessionpython wrapper is completely destroyed and garbage-collected.LanguageModelSessiontranscript history automatically retains previous prompts and attachments. Therefore, in a single persistent session run, previous attachment file descriptors are kept open throughout the session's lifetime.To completely prevent these file descriptor leaks during sequential loops, both applying the monkey-patch and recreating the session seem to be required:
session._release()method in a loop) leads to duplicate deallocation and double-free crashes (EXC_BREAKPOINT / SIGTRAPinlibswiftCore.dylib) because Python's garbage collector automatically runs the session destructor__del__which tries to release the raw_ptragain.Under macOS, even though the soft file descriptor limit can be high (e.g.,
1,048,575), sequential predictions consistently fail after exactly 240-250 sequential calls with image attachments. The system starts throwing a fatalOSError: [Errno 9] Bad file descriptoron any subsequent file system opens (including standard Pythonopen(),PIL.Image.open(), or system plist reads).While the user-facing header limit
__DARWIN_FD_SETSIZEis defined as1024, the consistent failure threshold at ~250 open files suggests one of several highly plausible system-level limits in Apple's closed-source background frameworks:/usr/libexec/macOSFoundationModels) communicating with Python via XPC is subject to strict concurrent connection caps. Standard sandboxing templates cap concurrent client channels at256. Once the process leaks more than ~250 channels, launchd/sandbox rejects new handshakes, causing the socket descriptor to be invalid (EBADF).CFRunLoopSocket Registration Limit: Asynchronous socket events inside the framework are managed by CoreFoundation’sCFRunLoop. CFSocket/CFFileDescriptor registration is historically capped at256per thread runloop to prevent socket exhaustion.select()worker loops over active generation tasks, using pre-allocated thread-local arrays capped at256slots.Steps to Reproduce
I have created a standalone, single-file reproducer script (
afm_leak_reproducer.py) with zero dependencies other thanapple-fm-sdkandpillowto demonstrate the leak.Reproducer Code (
afm_leak_reproducer.py):Test Results Traces (10 Iterations)
Here is the captured console trace demonstrating the behavior across all four modes:
Final FDs: 7) is--patched-recreate, suggesting that both the monkey patch and session recreation are required to plug both leak channels.Proposed Fix
In
apple_fm_sdk/session.py, modifying thefinally:block of_respond_with_schema_from_json(and any other generation methods that initialize anFMComposedPromptpointer) to explicitly releasecomposed_promptseems to resolve the issue: