Native `FMComposedPrompt` Pointer and File Descriptor Leak in `LanguageModelSession._respond_with_schema_from_json()`

## Description
When running structured/guided generation using a JSON Schema, the `apple-fm-sdk` appears to leak a native `FMComposedPrompt` pointer and its associated socket/XPC file descriptors on sequential `respond()` requests.

### The Source of the Leak:
Inside `apple_fm_sdk/session.py` in the **`_respond_with_schema_from_json()`** method:
1. `_composed_prompt_from_prompt(prompt)` is called, which initializes a native composed prompt pointer via C-bindings:
   ```python
   composed_prompt = self._composed_prompt_from_prompt(prompt=prompt)
   # Under the hood, this calls lib.FMComposedPromptInitialize()
   ```
2. The session creates the guided generation `task` pointer and submits it to the underlying compiled framework.
3. In the `finally:` cleanup block, the session releases the native `task` pointer (`lib.FMRelease(task)`), but the `composed_prompt` pointer is not released:
   ```python
   finally:
       _unregister_handle(future_handle)
       lib.FMRelease(task)
       # Note: composed_prompt is not released
       self._active_task = None
   ```

---

## Impact & Observed Behavior
Because the native `FMComposedPrompt` structure is not released, sequential or iterative structured generation runs (such as batch processing pipelines) leak a native composed prompt structure and its underlying XPC/Mach connection handles per request.

Through testing, I observed that file descriptor leakage is caused by two overlapping sources:
1. **The Native Pointer Leak:** The native `FMComposedPrompt` pointer is leaked on the heap. Since it holds onto the visual `ImageAttachment` reference, it keeps the underlying image file descriptor open even if the `LanguageModelSession` python wrapper is completely destroyed and garbage-collected.
2. **The Transcript History Retain:** The native `LanguageModelSession` transcript history automatically retains previous prompts and attachments. Therefore, in a single persistent session run, previous attachment file descriptors are kept open throughout the session's lifetime.

To completely prevent these file descriptor leaks during sequential loops, both applying the monkey-patch and recreating the session seem to be required:
* To clear the transcript history, the session must be recreated.
* However, attempting to clear these channels by manually forcing the release of the native session resources (by calling the internal `session._release()` method in a loop) leads to duplicate deallocation and double-free crashes (`EXC_BREAKPOINT / SIGTRAP` in `libswiftCore.dylib`) because Python's garbage collector automatically runs the session destructor `__del__` which tries to release the raw `_ptr` again.

Under macOS, even though the soft file descriptor limit can be high (e.g., `1,048,575`), sequential predictions consistently fail after exactly **240-250 sequential calls with image attachments**. The system starts throwing a fatal **`OSError: [Errno 9] Bad file descriptor`** on any subsequent file system opens (including standard Python `open()`, `PIL.Image.open()`, or system plist reads).

While the user-facing header limit `__DARWIN_FD_SETSIZE` is defined as `1024`, the consistent failure threshold at ~250 open files suggests one of several highly plausible system-level limits in Apple's closed-source background frameworks:
1. **launchd / Sandbox Concurrent XPC Connection Cap:** The background system daemon (`/usr/libexec/macOSFoundationModels`) communicating with Python via XPC is subject to strict concurrent connection caps. Standard sandboxing templates cap concurrent client channels at `256`. Once the process leaks more than ~250 channels, launchd/sandbox rejects new handshakes, causing the socket descriptor to be invalid (`EBADF`).
2. **CoreFoundation `CFRunLoop` Socket Registration Limit:** Asynchronous socket events inside the framework are managed by CoreFoundation’s `CFRunLoop`. CFSocket/CFFileDescriptor registration is historically capped at `256` per thread runloop to prevent socket exhaustion.
3. **Internal C++ select() Thread Capping:** Private Neural Engine worker threads managing dispatch often run private `select()` worker loops over active generation tasks, using pre-allocated thread-local arrays capped at `256` slots.

---

## Steps to Reproduce
I have created a standalone, single-file reproducer script (`afm_leak_reproducer.py`) with zero dependencies other than `apple-fm-sdk` and `pillow` to demonstrate the leak.

### Reproducer Code (`afm_leak_reproducer.py`):
```python
import argparse
import asyncio
import gc
import json
import os
import sys
import tempfile
from pathlib import Path
from typing import Any
import apple_fm_sdk as fm
from apple_fm_sdk.session import lib as fm_lib_module, _register_handle, _unregister_handle, _session_structured_callback
fm_lib: Any = fm_lib_module

# Minimal Schema representing an object
DUMMY_SCHEMA = {
    "type": "object",
    "properties": {"reply": {"type": "string"}},
    "required": ["reply"],
    "title": "ModelResponse",
    "x-order": ["reply"],
    "additionalProperties": False
}

def apply_monkey_patch():
    """Monkey-patch to explicitly release the composed prompt pointer."""
    async def patched_respond_with_schema_from_json(self, prompt: Any, json_schema: dict, options: Any = None) -> Any:
        async with self._request_lock:
            loop = asyncio.get_running_loop()
            future = loop.create_future()
            composed_prompt = self._composed_prompt_from_prompt(prompt=prompt)
            json_schema_bytes = json.dumps(json_schema).encode("utf-8")
            options_json = None
            if options is not None:
                options_json = json.dumps(options.to_dict()).encode("utf-8")
            future_handle = _register_handle(future)
            task = fm_lib.FMLanguageModelSessionRespondWithSchemaFromJSON(
                self._ptr, composed_prompt, json_schema_bytes, options_json, future_handle, _session_structured_callback
            )
            self._active_task = task
            try:
                await future
            except Exception as e:
                self._reset_task_state()
                raise e
            finally:
                _unregister_handle(future_handle)
                fm_lib.FMRelease(task)  # type: ignore
                
                # --- BUG FIX ---
                if composed_prompt:
                    try:
                        fm_lib.FMRelease(composed_prompt)  # type: ignore
                    except Exception:
                        pass
                # -------------------------
                self._active_task = None
            return future.result()
    fm.LanguageModelSession._respond_with_schema_from_json = patched_respond_with_schema_from_json  # type: ignore

def get_open_fds_count() -> int:
    return len(os.listdir("/dev/fd")) if os.path.exists("/dev/fd") else -1

async def run_single_prediction(session: fm.LanguageModelSession, i: int, dummy_image_path: Path) -> None:
    prompt = f"Echo hello {i}"
    attachment = fm.ImageAttachment(dummy_image_path)
    full_prompt = [prompt, attachment]
    _ = await session.respond(full_prompt, json_schema=DUMMY_SCHEMA)  # type: ignore

async def main():
    parser = argparse.ArgumentParser()
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument("--unpatched", action="store_true")
    group.add_argument("--unpatched-recreate", action="store_true")
    group.add_argument("--patched", action="store_true")
    group.add_argument("--patched-recreate", action="store_true")
    parser.add_argument("--iterations", type=int, default=10)
    args = parser.parse_args()

    recreate_session = args.unpatched_recreate or args.patched_recreate
    use_patch = args.patched or args.patched_recreate

    if use_patch:
        apply_monkey_patch()

    from PIL import Image
    temp_dir = Path(tempfile.gettempdir())
    dummy_image_path = temp_dir / "afm_leak_dummy.jpg"
    im = Image.new("RGB", (32, 32), "white")
    im.save(dummy_image_path)

    session = None
    if not recreate_session:
        session = fm.LanguageModelSession(instructions="Reply with short replies.")

    for i in range(1, args.iterations + 1):
        active_session = session
        if recreate_session:
            active_session = fm.LanguageModelSession(instructions="Reply with short replies.")
        
        await run_single_prediction(active_session, i, dummy_image_path)  # type: ignore
        
        if recreate_session:
            active_session = None
        gc.collect()
        
        if i % 20 == 0 or i == 1 or i == args.iterations:
            print(f"Iteration {i}/{args.iterations} | Open File Descriptors: {get_open_fds_count()}")

if __name__ == "__main__":
    asyncio.run(main())
```

---

## Test Results Traces (10 Iterations)
Here is the captured console trace demonstrating the behavior across all four modes:

```text
=== AFM LEAK RESULTS ===

--- MODE: --unpatched ---
[INFO] Running UNPATCHED leaky mode (single persistent session)...
Created temporary test image at: /var/folders/h5/b8q8m36565q68y5gjd9yqczr0000gn/T/afm_leak_dummy.jpg
Initial open File Descriptors: 7
Running guided text generation with ImageAttachment for 10 iterations...
Iteration 1/10 | Open File Descriptors: 8
Iteration 10/10 | Open File Descriptors: 17

[SUCCESS] Completed all iterations successfully!
Final open File Descriptors: 17

--- MODE: --unpatched-recreate ---
[INFO] Running UNPATCHED leaky mode (re-creating session on EACH iteration)...
Created temporary test image at: /var/folders/h5/b8q8m36565q68y5gjd9yqczr0000gn/T/afm_leak_dummy.jpg
Initial open File Descriptors: 7
Running guided text generation with ImageAttachment for 10 iterations...
Iteration 1/10 | Open File Descriptors: 8
Iteration 10/10 | Open File Descriptors: 17

[SUCCESS] Completed all iterations successfully!
Final open File Descriptors: 17

--- MODE: --patched ---
[INFO] Monkey-patching apple_fm_sdk.LanguageModelSession._respond_with_schema_from_json...
[INFO] Running PATCHED leak-free mode (single persistent session)...
Created temporary test image at: /var/folders/h5/b8q8m36565q68y5gjd9yqczr0000gn/T/afm_leak_dummy.jpg
Initial open File Descriptors: 7
Running guided text generation with ImageAttachment for 10 iterations...
Iteration 1/10 | Open File Descriptors: 8
Iteration 10/10 | Open File Descriptors: 17

[SUCCESS] Completed all iterations successfully!
Final open File Descriptors: 17

--- MODE: --patched-recreate ---
[INFO] Monkey-patching apple_fm_sdk.LanguageModelSession._respond_with_schema_from_json...
[INFO] Running PATCHED leak-free mode (re-creating session on EACH iteration)...
Created temporary test image at: /var/folders/h5/b8q8m36565q68y5gjd9yqczr0000gn/T/afm_leak_dummy.jpg
Initial open File Descriptors: 7
Running guided text generation with ImageAttachment for 10 iterations...
Iteration 1/10 | Open File Descriptors: 7
Iteration 10/10 | Open File Descriptors: 7

[SUCCESS] Completed all iterations successfully!
Final open File Descriptors: 7
```

* **Observation:** The only mode that achieves perfectly flat, constant file descriptor counts (`Final FDs: 7`) is `--patched-recreate`, suggesting that both the monkey patch and session recreation are required to plug both leak channels.

---

## Proposed Fix
In `apple_fm_sdk/session.py`, modifying the `finally:` block of `_respond_with_schema_from_json` (and any other generation methods that initialize an `FMComposedPrompt` pointer) to explicitly release `composed_prompt` seems to resolve the issue:

```python
        finally:
            _unregister_handle(future_handle)
            lib.FMRelease(task)
            
            # --- FIX: Release the native composed prompt pointer ---
            if composed_prompt:
                try:
                    lib.FMRelease(composed_prompt)
                except Exception:
                    pass
            # -------------------------------------------------------
            
            self._active_task = None
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Native `FMComposedPrompt` Pointer and File Descriptor Leak in `LanguageModelSession._respond_with_schema_from_json()` #17

Description

The Source of the Leak:

Impact & Observed Behavior

Steps to Reproduce

Reproducer Code (`afm_leak_reproducer.py`):

Test Results Traces (10 Iterations)

Proposed Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Native FMComposedPrompt Pointer and File Descriptor Leak in LanguageModelSession._respond_with_schema_from_json() #17

Description

Description

The Source of the Leak:

Impact & Observed Behavior

Steps to Reproduce

Reproducer Code (afm_leak_reproducer.py):

Test Results Traces (10 Iterations)

Proposed Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Native `FMComposedPrompt` Pointer and File Descriptor Leak in `LanguageModelSession._respond_with_schema_from_json()` #17

Reproducer Code (`afm_leak_reproducer.py`):