RoVid-X Caption Model Review

10 selected videos, models Qwen 3.5 9B, Qwen 3.5 27B, Cosmos 3 Nano (16B), Cosmos 3 Super (64B), prompts p08: Cosmos 3 schema, p09: Reproduction schema. Generated 2026-06-10T18:57:46+0000. Source manifest 2026-06-09T21:01:36+0000.

rovid_000

top

Qwen 3.5 9B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 9B · 102.4s · 2026-06-10T17:40:29+0000

Short Read

At the beginning, two robotic arms approach two orange juice bottles lying horizontally in a white tray. In the early phase, the left arm grasps the left bottle and lifts it out of the tray. By the middle, the left arm holds the bottle above the blue surface while the right arm reaches for the second bottle. In the late phase, the right arm successfully grasps the second bottle and lifts it. At the end, both bottles are held by the respective robotic arms, suspended above the blue surface.

Training Caption Field

In a top-down view, two robotic arms with white and black components and green LED indicators manipulate two orange juice bottles on a blue cloth surface. Initially, the bottles lie horizontally in a white plastic tray. The left robotic arm grasps the left bottle and lifts it out of the tray, holding it above the blue surface. Simultaneously, the right robotic arm approaches the second bottle, grasps it, and lifts it out of the tray. By the end, both robotic arms hold their respective bottles suspended in the air, demonstrating precise manipulation and object permanence.

Full output JSON/text
{
  "scene_imagination": "A top-down view of a robotic manipulation task on a blue cloth surface. Two white and black robotic arms with green indicator lights are positioned on either side of a white plastic tray. Inside the tray lie two plastic bottles filled with orange liquid, likely juice, with visible labels. The robot arms are actively engaged in grasping and moving the bottles.",
  "temporal_caption": "At the beginning, two robotic arms approach two orange juice bottles lying horizontally in a white tray. In the early phase, the left arm grasps the left bottle and lifts it out of the tray. By the middle, the left arm holds the bottle above the blue surface while the right arm reaches for the second bottle. In the late phase, the right arm successfully grasps the second bottle and lifts it. At the end, both bottles are held by the respective robotic arms, suspended above the blue surface.",
  "audio_description": null,
  "subjects": [
    {
      "description": "Left robotic arm",
      "appearance_details": "White and black mechanical structure with articulated joints and a gripper end-effector featuring a green LED light.",
      "relationship": "Manipulating the left orange juice bottle.",
      "location": "Left side of the frame, hovering over the blue surface.",
      "relative_size": "Large, comparable to the tray.",
      "orientation": "Angled towards the center.",
      "pose": "Dynamic, moving from a resting position to a grasping and lifting pose.",
      "action": "Grasping and lifting the left bottle.",
      "state_changes": "Moves from idle to active manipulation."
    },
    {
      "description": "Right robotic arm",
      "appearance_details": "White and black mechanical structure with articulated joints and a gripper end-effector featuring a green LED light.",
      "relationship": "Manipulating the right orange juice bottle.",
      "location": "Right side of the frame, hovering over the blue surface.",
      "relative_size": "Large, comparable to the tray.",
      "orientation": "Angled towards the center.",
      "pose": "Dynamic, moving from a resting position to a grasping and lifting pose.",
      "action": "Grasping and lifting the right bottle.",
      "state_changes": "Moves from idle to active manipulation."
    },
    {
      "description": "Left orange juice bottle",
      "appearance_details": "Plastic bottle containing orange liquid with a label featuring fruit imagery.",
      "relationship": "Being manipulated by the left robotic arm.",
      "location": "Initially in the tray, then held above the blue surface.",
      "relative_size": "Small compared to the robot arms.",
      "orientation": "Horizontal, then tilted as it is lifted.",
      "pose": "Lying flat, then suspended.",
      "action": "Being lifted out of the tray.",
      "state_changes": "Transitions from resting in the tray to being held in the air."
    },
    {
      "description": "Right orange juice bottle",
      "appearance_details": "Plastic bottle containing orange liquid with a label featuring fruit imagery.",
      "relationship": "Being manipulated by the right robotic arm.",
      "location": "Initially in the tray, then held above the blue surface.",
      "relative_size": "Small compared to the robot arms.",
      "orientation": "Horizontal, then tilted as it is lifted.",
      "pose": "Lying flat, then suspended.",
      "action": "Being lifted out of the tray.",
      "state_changes": "Transitions from resting in the tray to being held in the air."
    }
  ],
  "background_setting": "A flat surface covered with a bright blue cloth. A white plastic tray sits centrally on the cloth.",
  "lighting": {
    "conditions": "Bright, diffuse indoor lighting.",
    "direction": "Overhead, creating soft shadows.",
    "shadows": "Soft shadows cast by the robot arms and bottles onto the blue cloth.",
    "illumination_effect": "Clear visibility of object details and robot mechanics."
  },
  "aesthetics": {
    "composition": "Symmetrical arrangement of robot arms flanking the central tray.",
    "color_scheme": "Dominant blue background with white and black robots and orange liquid accents.",
    "mood_atmosphere": "Clinical, precise, and functional.",
    "patterns": "None significant."
  },
  "cinematography": {
    "framing": "Top-down, close-up shot.",
    "camera_angle": "High angle, looking down at the table surface.",
    "depth_of_field": "Shallow depth of field, keeping the tray and bottles in focus.",
    "focus": "Sharp focus on the robot grippers and bottles.",
    "lens_focal_length": "Unknown, likely a standard lens for close-up work.",
    "camera_motion": "Static camera position."
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "A demonstration of robotic dexterity and object manipulation, likely for research or testing purposes.",
  "actions": [
    {
      "time": "0:00-0:01",
      "description": "The left robotic arm approaches and grasps the left orange juice bottle.",
      "actor": "Left robotic arm",
      "object": "Left orange juice bottle",
      "contact_event": "Gripper closes around the bottle.",
      "state_change": "Bottle is lifted from the tray.",
      "evidence": "Visual confirmation of the bottle being lifted."
    },
    {
      "time": "0:01-0:02",
      "description": "The left arm holds the bottle while the right arm approaches the second bottle.",
      "actor": "Right robotic arm",
      "object": "Right orange juice bottle",
      "contact_event": "Gripper approaches the bottle.",
      "state_change": "Bottle remains in the tray.",
      "evidence": "Visual confirmation of the right arm's approach."
    },
    {
      "time": "0:02-0:03",
      "description": "The right robotic arm grasps the right orange juice bottle.",
      "actor": "Right robotic arm",
      "object": "Right orange juice bottle",
      "contact_event": "Gripper closes around the bottle.",
      "state_change": "Bottle is lifted from the tray.",
      "evidence": "Visual confirmation of the bottle being lifted."
    },
    {
      "time": "0:03-0:04",
      "description": "Both robotic arms hold their respective bottles above the blue surface.",
      "actor": "Both robotic arms",
      "object": "Both orange juice bottles",
      "contact_event": "Grippers maintain hold on the bottles.",
      "state_change": "Bottles are suspended in the air.",
      "evidence": "Visual confirmation of both bottles being held."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Unknown text on bottle labels",
      "category": "Product labeling",
      "appearance": "Printed on the labels of the orange juice bottles.",
      "spatial": "On the bottles.",
      "spatial_temporal": "Visible throughout the video.",
      "context": "Identifies the product being manipulated."
    }
  ],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "0:00-0:01",
      "description": "The left robotic arm grasps and lifts the left bottle.",
      "key_changes": "Left bottle is lifted.",
      "camera": "Static.",
      "robot_motion": "Left arm moves and lifts.",
      "object_motion": "Left bottle is lifted.",
      "physical_logic": "Grasp and lift action."
    },
    {
      "segment_index": 2,
      "time_range": "0:01-0:02",
      "description": "The right robotic arm approaches the right bottle.",
      "key_changes": "Right arm approaches.",
      "camera": "Static.",
      "robot_motion": "Right arm moves.",
      "object_motion": "None.",
      "physical_logic": "Approach for grasp."
    },
    {
      "segment_index": 3,
      "time_range": "0:02-0:03",
      "description": "The right robotic arm grasps and lifts the right bottle.",
      "key_changes": "Right bottle is lifted.",
      "camera": "Static.",
      "robot_motion": "Right arm moves and lifts.",
      "object_motion": "Right bottle is lifted.",
      "physical_logic": "Grasp and lift action."
    },
    {
      "segment_index": 4,
      "time_range": "0:03-0:04",
      "description": "Both robotic arms hold their respective bottles.",
      "key_changes": "Both bottles are held.",
      "camera": "Static.",
      "robot_motion": "Stable hold.",
      "object_motion": "None.",
      "physical_logic": "Sustained grasp."
    }
  ],
  "transitions": [
    "Left bottle lifted.",
    "Right arm approaches.",
    "Right bottle lifted.",
    "Both bottles held."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "5.767s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Dual robotic arms with white and black components.",
    "end_effector": "Grippers with green LED indicators.",
    "manipulated_objects": "Two orange juice bottles.",
    "task_goal": "Lift and hold the bottles.",
    "initial_state": "Bottles in tray, robots idle.",
    "terminal_state": "Bottles held by robots.",
    "affordances": "Grippers designed for grasping cylindrical objects.",
    "failure_modes": "None observed."
  },
  "physical_reasoning": {
    "contact_and_grasp": "Robots successfully grasp the bottles using their grippers.",
    "support_and_stability": "Robots maintain a stable hold on the bottles.",
    "force_gravity_friction": "Robots overcome gravity to lift the bottles.",
    "causal_chain": "Grasp leads to lift, which leads to holding.",
    "object_permanence": "Bottles remain intact throughout the manipulation.",
    "next_action_prediction": "Robots may place the bottles elsewhere or perform further manipulation."
  },
  "uncertainties": [
    "Exact text on bottle labels is not fully legible."
  ],
  "comprehensive_t2v_caption": "In a top-down view, two robotic arms with white and black components and green LED indicators manipulate two orange juice bottles on a blue cloth surface. Initially, the bottles lie horizontally in a white plastic tray. The left robotic arm grasps the left bottle and lifts it out of the tray, holding it above the blue surface. Simultaneously, the right robotic arm approaches the second bottle, grasps it, and lifts it out of the tray. By the end, both robotic arms hold their respective bottles suspended in the air, demonstrating precise manipulation and object permanence."
}

Qwen 3.5 27B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 27B · 178.8s · 2026-06-10T17:56:15+0000

Short Read

At the beginning, two robotic arms hover over a white tray containing two orange juice bottles. In the early phase, the left robotic arm reaches in and grasps the left bottle. By the middle of the sequence, the left arm lifts the bottle out of the tray and places it onto the blue surface below. Simultaneously, the right robotic arm reaches for the remaining bottle in the tray. Towards the end, the right arm successfully grasps the second bottle and begins to lift it out of the tray.

Training Caption Field

A dual-arm robotic system performs a pick-and-place task on a blue surface. Initially, two plastic bottles filled with orange liquid rest horizontally in a white plastic tray. The left robotic arm, white and grey with green indicator lights, reaches in and firmly grasps the left bottle. It then lifts the bottle out of the tray and places it onto the blue surface below. As the left arm completes its action, the right robotic arm, identical in appearance, reaches into the tray and grasps the remaining bottle, beginning to lift it. The scene is brightly lit from above, highlighting the smooth plastic surfaces and the precise movements of the robotic grippers. The camera remains static throughout, providing a clear view of the manipulation process.

Full output JSON/text
{
  "scene_imagination": "A close-up, high-angle shot captures a robotic manipulation task on a blue surface. Two white and grey robotic arms with multi-fingered grippers are positioned on either side of a white plastic tray. Inside the tray lie two plastic bottles filled with orange liquid, labeled with orange imagery. The lighting is bright and diffuse, casting soft shadows. The camera is static, focusing on the interaction between the robots and the bottles.",
  "temporal_caption": "At the beginning, two robotic arms hover over a white tray containing two orange juice bottles. In the early phase, the left robotic arm reaches in and grasps the left bottle. By the middle of the sequence, the left arm lifts the bottle out of the tray and places it onto the blue surface below. Simultaneously, the right robotic arm reaches for the remaining bottle in the tray. Towards the end, the right arm successfully grasps the second bottle and begins to lift it out of the tray.",
  "audio_description": null,
  "subjects": [
    {
      "description": "Left robotic arm",
      "appearance_details": "White and grey body with black joints and green indicator lights on the fingers.",
      "relationship": "Manipulator",
      "location": "Left side of the frame",
      "relative_size": "Large",
      "orientation": "Facing right",
      "pose": "Reaching and grasping",
      "action": "Grasps and removes the left bottle from the tray.",
      "state_changes": "Moves from hovering to grasping to lifting."
    },
    {
      "description": "Right robotic arm",
      "appearance_details": "White and grey body with black joints and green indicator lights on the fingers.",
      "relationship": "Manipulator",
      "location": "Right side of the frame",
      "relative_size": "Large",
      "orientation": "Facing left",
      "pose": "Reaching and grasping",
      "action": "Grasps the right bottle from the tray.",
      "state_changes": "Moves from hovering to grasping."
    },
    {
      "description": "Orange juice bottles",
      "appearance_details": "Clear plastic bottles with orange liquid and labels featuring oranges.",
      "relationship": "Manipulated objects",
      "location": "Initially in the tray, then on the blue surface.",
      "relative_size": "Medium",
      "orientation": "Horizontal",
      "pose": "Lying down",
      "action": "Being moved from the tray to the table.",
      "state_changes": "Moved from contained to uncontained state."
    },
    {
      "description": "White plastic tray",
      "appearance_details": "Rectangular container with raised edges.",
      "relationship": "Container",
      "location": "Center of the frame",
      "relative_size": "Medium",
      "orientation": "Horizontal",
      "pose": "Stationary",
      "action": "Holds the bottles initially.",
      "state_changes": "Becomes empty as bottles are removed."
    }
  ],
  "background_setting": "A simple workspace setup featuring a blue cloth or mat covering a table surface. A white plastic tray sits centrally on the blue surface.",
  "lighting": {
    "conditions": "Bright, indoor artificial lighting.",
    "direction": "Overhead and slightly from the front.",
    "shadows": "Soft shadows cast by the robots and bottles onto the tray and blue surface.",
    "illumination_effect": "Even illumination with slight glare on the plastic bottles."
  },
  "aesthetics": {
    "composition": "Symmetrical framing with robots on opposite sides converging on the central tray.",
    "color_scheme": "Dominant blue background, white robots and tray, and orange accents from the bottles.",
    "mood_atmosphere": "Clinical, precise, and functional.",
    "patterns": "Repetitive structure of the robotic fingers and the parallel placement of the bottles."
  },
  "cinematography": {
    "framing": "Close-up shot focusing on the manipulation area.",
    "camera_angle": "High angle, looking down at the workspace.",
    "depth_of_field": "Deep depth of field, keeping the entire workspace in focus.",
    "focus": "Sharp focus on the robots and bottles.",
    "lens_focal_length": "Unknown, likely standard or slightly wide.",
    "camera_motion": "Static camera position."
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "Inferred task: A dual-arm robotic system is performing a pick-and-place operation, removing objects from a container.",
  "actions": [
    {
      "time": "00:00 - 00:02",
      "description": "The left robotic arm approaches and grasps the left bottle.",
      "actor": "Left robotic arm",
      "object": "Left orange juice bottle",
      "contact_event": "Gripper closes around the bottle body.",
      "state_change": "Bottle transitions from resting to being held.",
      "evidence": "Visible finger closure and bottle displacement."
    },
    {
      "time": "00:02 - 00:04",
      "description": "The left arm lifts the bottle out of the tray and places it on the blue surface.",
      "actor": "Left robotic arm",
      "object": "Left orange juice bottle",
      "contact_event": "Gripper releases the bottle onto the surface.",
      "state_change": "Bottle moves from tray to table.",
      "evidence": "Bottle is seen leaving the tray and settling on the blue cloth."
    },
    {
      "time": "00:04 - 00:06",
      "description": "The right robotic arm grasps the remaining bottle in the tray.",
      "actor": "Right robotic arm",
      "object": "Right orange juice bottle",
      "contact_event": "Gripper closes around the bottle body.",
      "state_change": "Bottle transitions from resting to being held.",
      "evidence": "Visible finger closure and slight bottle movement."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Unknown characters and graphics",
      "category": "Product label",
      "appearance": "White label with orange imagery and text on the bottles.",
      "spatial": "Wrapped around the body of the bottles.",
      "spatial_temporal": "Visible throughout the video as bottles are manipulated.",
      "context": "Identifies the product as orange juice or a similar beverage."
    }
  ],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:02",
      "description": "Initial state with both bottles in the tray and robots approaching.",
      "key_changes": "Left robot initiates grasp.",
      "camera": "Static",
      "robot_motion": "Left arm moves towards bottle.",
      "object_motion": "None initially.",
      "physical_logic": "Approach phase of manipulation."
    },
    {
      "segment_index": 2,
      "time_range": "00:02 - 00:04",
      "description": "Left robot lifts and places the first bottle.",
      "key_changes": "First bottle removed from tray.",
      "camera": "Static",
      "robot_motion": "Left arm lifts and lowers.",
      "object_motion": "Bottle moves from tray to table.",
      "physical_logic": "Pick and place execution."
    },
    {
      "segment_index": 3,
      "time_range": "00:04 - 00:06",
      "description": "Right robot grasps the second bottle.",
      "key_changes": "Second bottle is grasped.",
      "camera": "Static",
      "robot_motion": "Right arm moves towards bottle.",
      "object_motion": "Bottle slightly shifts as grasped.",
      "physical_logic": "Second grasp initiation."
    }
  ],
  "transitions": [
    "Left bottle moves from tray to table.",
    "Right bottle transitions from resting to being grasped."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16:9",
  "duration": "5.767s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Dual-arm robotic system with anthropomorphic grippers.",
    "end_effector": "Multi-fingered grippers with green status lights.",
    "manipulated_objects": "Two plastic bottles of orange liquid.",
    "task_goal": "Remove bottles from the tray.",
    "initial_state": "Bottles in tray, robots idle.",
    "terminal_state": "One bottle on table, one bottle being lifted.",
    "affordances": "Grippers designed for grasping cylindrical objects.",
    "failure_modes": "Potential slippage or collision if not coordinated."
  },
  "physical_reasoning": {
    "contact_and_grasp": "Robots use friction and enclosure to grasp the smooth plastic bottles.",
    "support_and_stability": "Bottles are stable in the tray until lifted; stable on the flat blue surface after placement.",
    "force_gravity_friction": "Gravity acts on the bottles; robots apply upward force to lift and friction to hold.",
    "causal_chain": "Grasp leads to lift, which leads to placement.",
    "object_permanence": "Bottles remain visible and consistent throughout the manipulation.",
    "next_action_prediction": "The right arm will likely lift the second bottle out of the tray."
  },
  "uncertainties": [
    "Exact brand or text on the bottle labels is not clearly legible.",
    "The specific control algorithm or coordination between the two arms is not visible."
  ],
  "comprehensive_t2v_caption": "A dual-arm robotic system performs a pick-and-place task on a blue surface. Initially, two plastic bottles filled with orange liquid rest horizontally in a white plastic tray. The left robotic arm, white and grey with green indicator lights, reaches in and firmly grasps the left bottle. It then lifts the bottle out of the tray and places it onto the blue surface below. As the left arm completes its action, the right robotic arm, identical in appearance, reaches into the tray and grasps the remaining bottle, beginning to lift it. The scene is brightly lit from above, highlighting the smooth plastic surfaces and the precise movements of the robotic grippers. The camera remains static throughout, providing a clear view of the manipulation process."
}

Cosmos 3 Nano (16B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Nano (16B) · 16.1s · 2026-06-10T18:35:22+0000

Short Read

The video begins with both robotic arms positioned over the white tray containing two orange bottles. The left arm moves to grasp one bottle, lifts it out of the tray, and places it on the blue surface. Simultaneously, the right arm moves to grasp the second bottle, lifts it out of the tray, and places it on the blue surface. Both bottles are now resting on the blue fabric.

Training Caption Field

A top-down view of two robotic arms with grippers manipulating orange bottles. The arms, white with black joints and green indicator lights, pick up the bottles from a white plastic tray and place them on a blue fabric surface. The scene is brightly lit, with the robotic arms moving smoothly and precisely to complete the task.

Full output JSON/text
{
  "scene_imagination": "A top-down view of a robotic manipulation setup. Two robotic arms with grippers are positioned on either side of a white plastic tray containing two orange-colored bottles with fruit imagery on their labels. The tray rests on a blue fabric surface. The robotic arms have white and black components with green indicator lights. The arms move to pick up the bottles from the tray and place them on the blue surface.",
  "temporal_caption": "The video begins with both robotic arms positioned over the white tray containing two orange bottles. The left arm moves to grasp one bottle, lifts it out of the tray, and places it on the blue surface. Simultaneously, the right arm moves to grasp the second bottle, lifts it out of the tray, and places it on the blue surface. Both bottles are now resting on the blue fabric.",
  "audio_description": "The sound of robotic arms moving and gripping plastic bottles, followed by the soft thud of the bottles being placed on the fabric surface.",
  "subjects": [
    {
      "description": "A robotic arm with a gripper, primarily white with black joints and green indicator lights.",
      "appearance_details": "White body with black joints, green LED lights, and a black gripper mechanism.",
      "relationship": "Interacting with the bottles in the tray.",
      "location": "Left side of the frame.",
      "relative_size": "Large within frame.",
      "orientation": "Facing the tray.",
      "pose": "Extended towards the tray.",
      "action": "Grasps a bottle from the tray and places it on the blue surface.",
      "state_changes": "Moves from the tray to the blue surface, then retracts."
    },
    {
      "description": "A second robotic arm with a gripper, similar in design to the first, primarily white with black joints and green indicator lights.",
      "appearance_details": "White body with black joints, green LED lights, and a black gripper mechanism.",
      "relationship": "Interacting with the bottles in the tray.",
      "location": "Right side of the frame.",
      "relative_size": "Large within frame.",
      "orientation": "Facing the tray.",
      "pose": "Extended towards the tray.",
      "action": "Grasps a bottle from the tray and places it on the blue surface.",
      "state_changes": "Moves from the tray to the blue surface, then retracts."
    },
    {
      "description": "Two orange-colored plastic bottles with fruit imagery on their labels, containing an orange liquid.",
      "appearance_details": "Orange liquid, white caps, labels with fruit imagery.",
      "relationship": "Being manipulated by the robotic arms.",
      "location": "Initially in the tray, then on the blue surface.",
      "relative_size": "Medium within frame.",
      "orientation": "Lying horizontally.",
      "pose": "Resting in the tray, then placed on the blue surface.",
      "action": "Moved from the tray to the blue surface by the robotic arms.",
      "state_changes": "Position changes from tray to blue surface."
    }
  ],
  "background_setting": "A white plastic tray resting on a blue fabric surface. The setting appears to be an indoor workspace or laboratory environment.",
  "lighting": {
    "conditions": "Bright indoor lighting.",
    "direction": "Top-lit.",
    "shadows": "Soft shadows cast by the robotic arms and bottles.",
    "illumination_effect": "Even illumination across the scene."
  },
  "aesthetics": {
    "composition": "Centered on the tray and bottles, with robotic arms framing the scene.",
    "color_scheme": "High contrast between the white tray, orange bottles, and blue fabric.",
    "mood_atmosphere": "Clinical and precise.",
    "patterns": "None."
  },
  "cinematography": {
    "camera_angle": "Top-down view.",
    "framing": "Medium shot.",
    "depth_of_field": "Deep.",
    "focus": "Sharp focus on the tray, bottles, and robotic arms.",
    "lens_focal_length": "Standard.",
    "camera_motion": "None."
  },
  "style_medium": "Robotics video.",
  "artistic_style": "Real-world robot manipulation footage.",
  "context": "A demonstration of robotic manipulation capabilities, specifically picking up and placing objects.",
  "actions": [
    {
      "time": "0.0s-2.0s",
      "description": "The left robotic arm moves to grasp one of the bottles in the tray.",
      "actor": "Left robotic arm.",
      "object": "Bottle in the tray.",
      "contact_event": "Grippers close around the bottle.",
      "state_change": "Bottle is lifted from the tray."
    },
    {
      "time": "2.0s-3.5s",
      "description": "The left robotic arm places the bottle on the blue surface.",
      "actor": "Left robotic arm.",
      "object": "Bottle.",
      "contact_event": "Bottle is released onto the blue surface.",
      "state_change": "Bottle is now on the blue surface."
    },
    {
      "time": "2.5s-4.0s",
      "description": "The right robotic arm moves to grasp the second bottle in the tray.",
      "actor": "Right robotic arm.",
      "object": "Bottle in the tray.",
      "contact_event": "Grippers close around the bottle.",
      "state_change": "Bottle is lifted from the tray."
    },
    {
      "time": "4.0s-5.5s",
      "description": "The right robotic arm places the bottle on the blue surface.",
      "actor": "Right robotic arm.",
      "object": "Bottle.",
      "contact_event": "Bottle is released onto the blue surface.",
      "state_change": "Bottle is now on the blue surface."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Unreadable text on bottle labels.",
      "category": "Physical_in_scene",
      "appearance": "Printed text on the bottle labels.",
      "spatial": "On the bottles.",
      "spatial_temporal": "Visible throughout the video.",
      "context": "Product labels."
    }
  ],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0.0s-2.0s",
      "description": "The left robotic arm grasps a bottle from the tray.",
      "key_changes": "Bottle is lifted from the tray.",
      "camera": "Static top-down view.",
      "robot_motion": "Left robotic arm moves to grasp the bottle.",
      "object_motion": "Bottle is lifted from the tray.",
      "physical_reasoning": "The robotic arm's grippers close around the bottle, lifting it from the tray."
    },
    {
      "segment_index": 1,
      "time_range": "2.0s-3.5s",
      "description": "The left robotic arm places the bottle on the blue surface.",
      "key_changes": "Bottle is placed on the blue surface.",
      "camera": "Static top-down view.",
      "robot_motion": "Left robotic arm moves to place the bottle.",
      "object_motion": "Bottle is placed on the blue surface.",
      "physical_reasoning": "The robotic arm's grippers release the bottle onto the blue surface."
    },
    {
      "segment_index": 2,
      "time_range": "2.5s-4.0s",
      "description": "The right robotic arm grasps the second bottle from the tray.",
      "key_changes": "Bottle is lifted from the tray.",
      "camera": "Static top-down view.",
      "robot_motion": "Right robotic arm moves to grasp the bottle.",
      "object_motion": "Bottle is lifted from the tray.",
      "physical_reasoning": "The robotic arm's grippers close around the bottle, lifting it from the tray."
    },
    {
      "segment_index": 3,
      "time_range": "4.0s-5.5s",
      "description": "The right robotic arm places the bottle on the blue surface.",
      "key_changes": "Bottle is placed on the blue surface.",
      "camera": "Static top-down view.",
      "robot_motion": "Right robotic arm moves to place the bottle.",
      "object_motion": "Bottle is placed on the blue surface.",
      "physical_reasoning": "The robotic arm's grippers release the bottle onto the blue surface."
    }
  ],
  "transitions": [
    "Continuous action with no visible cuts or transitions."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "5.767s",
  "fps": 30,
  "robotics": {
    "embodiment": "Two robotic arms with grippers.",
    "end_effector": "Grippers.",
    "manipulated_objects": "Two orange bottles.",
    "task_goal": "Pick up and place bottles from a tray to a blue surface.",
    "initial_state": "Bottles are in the tray, robotic arms are positioned over the tray.",
    "terminal_state": "Bottles are on the blue surface, robotic arms are retracted.",
    "affordances": "Bottles can be grasped by their caps.",
    "failure_modes": "Bottles may slip from the grippers or be placed incorrectly."
  },
  "physical_reasoning": "The robotic arms use their grippers to grasp the bottles, lifting them from the tray and placing them on the blue surface. The movement is smooth and controlled, demonstrating precise manipulation capabilities.",
  "uncertainties": [],
  "comprehensive_t2v_caption": "A top-down view of two robotic arms with grippers manipulating orange bottles. The arms, white with black joints and green indicator lights, pick up the bottles from a white plastic tray and place them on a blue fabric surface. The scene is brightly lit, with the robotic arms moving smoothly and precisely to complete the task."
}

Cosmos 3 Super (64B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Super (64B) · 21.2s · 2026-06-10T16:09:22+0000

Short Read

The video begins with two robotic arms positioned on either side of a white tray containing two orange juice bottles. The left arm moves in, grasps the left bottle, and lifts it out of the tray, placing it on the blue surface below. The right arm then moves in, grasps the right bottle, and lifts it out of the tray.

Training Caption Field

Two robotic arms with green indicator lights are positioned on either side of a white plastic tray containing two orange juice bottles. The tray rests on a blue fabric surface. The left robot arm reaches into the tray, grasps one bottle, lifts it out, and places it on the blue surface. The right robot arm then reaches in, grasps the second bottle, and lifts it out of the tray. The camera remains stationary throughout, providing a top-down view of the action.

Full output JSON/text
{
  "scene_imagination": "Two robotic arms with green indicator lights are positioned on either side of a white plastic tray containing two orange juice bottles. The tray rests on a blue fabric surface. The left robot arm reaches into the tray, grasps one bottle, lifts it out, and places it on the blue surface. The right robot arm then reaches in, grasps the second bottle, and lifts it out of the tray. The camera remains stationary throughout, providing a top-down view of the action.",
  "temporal_caption": "The video begins with two robotic arms positioned on either side of a white tray containing two orange juice bottles. The left arm moves in, grasps the left bottle, and lifts it out of the tray, placing it on the blue surface below. The right arm then moves in, grasps the right bottle, and lifts it out of the tray.",
  "audio_description": "The video likely contains mechanical sounds from the robotic arms moving and grasping the bottles, as well as possible liquid sloshing sounds as the bottles are lifted.",
  "subjects": [
    {
      "description": "A robotic arm with a white and black body, featuring green indicator lights and black cables.",
      "appearance_details": "White body with black joints, green indicator lights on the wrist, black cables running along the arm.",
      "relationship": "Interacting with the orange juice bottles and the white tray.",
      "location": "Left side of the frame.",
      "relative_size": "Large within frame.",
      "orientation": "Facing the center of the frame.",
      "pose": "Extended towards the tray.",
      "action": "Reaches into the tray, grasps a bottle, lifts it out, and places it on the blue surface.",
      "state_changes": "Starts extended, moves to grasp a bottle, lifts it, and ends holding the bottle."
    },
    {
      "description": "A robotic arm with a white and black body, featuring green indicator lights and black cables.",
      "appearance_details": "White body with black joints, green indicator lights on the wrist, black cables running along the arm.",
      "relationship": "Interacting with the orange juice bottles and the white tray.",
      "location": "Right side of the frame.",
      "relative_size": "Large within frame.",
      "orientation": "Facing the center of the frame.",
      "pose": "Extended towards the tray.",
      "action": "Reaches into the tray, grasps a bottle, and lifts it out.",
      "state_changes": "Starts extended, moves to grasp a bottle, and ends lifting it."
    },
    {
      "description": "Two orange juice bottles with white caps and labels featuring orange slices.",
      "appearance_details": "Clear plastic bottles containing orange liquid, white caps, labels with orange slices and text.",
      "relationship": "Initially inside the white tray, then removed by the robotic arms.",
      "location": "Inside the white tray, then on the blue surface.",
      "relative_size": "Medium within frame.",
      "orientation": "Upright initially, then tilted as they are lifted.",
      "pose": "Standing upright in the tray, then held by the robotic arms.",
      "action": "Being grasped, lifted, and moved by the robotic arms.",
      "state_changes": "Starts in the tray, is grasped and lifted, and ends held by the robotic arms."
    },
    {
      "description": "A white plastic tray with a handle and compartments.",
      "appearance_details": "White plastic, rectangular shape with a handle on one side and compartments inside.",
      "relationship": "Holding the orange juice bottles.",
      "location": "Center of the frame.",
      "relative_size": "Large within frame.",
      "orientation": "Horizontal.",
      "pose": "Resting on the blue surface.",
      "action": "Holding the bottles, then empty as the bottles are removed.",
      "state_changes": "Starts holding two bottles, ends empty."
    }
  ],
  "background_setting": "A blue fabric surface covering a table or platform, with a white plastic tray resting on it.",
  "lighting": {
    "conditions": "Bright indoor lighting.",
    "direction": "Top-lit.",
    "shadows": "Soft shadows cast by the robotic arms and the tray.",
    "illumination_effect": "Even illumination across the scene, highlighting the objects and the robotic arms."
  },
  "aesthetics": {
    "composition": "Centered, with the tray and bottles in the middle and the robotic arms on either side.",
    "color_scheme": "High contrast between the white tray/arms, orange bottles, and blue background.",
    "mood_atmosphere": "Industrial, precise, and mechanical.",
    "patterns": "None visible."
  },
  "cinematography": {
    "camera_motion": "Static.",
    "framing": "Top-down view, wide shot.",
    "camera_angle": "Bird's-eye view.",
    "depth_of_field": "Deep.",
    "focus": "Sharp focus on the robotic arms, tray, and bottles.",
    "lens_focal_length": "Standard."
  },
  "style_medium": "Robotics video.",
  "artistic_style": "Realistic, documentary-style.",
  "context": "A demonstration of robotic manipulation, likely for educational or industrial purposes.",
  "actions": [
    {
      "time": "0-2s",
      "description": "The left robotic arm reaches into the tray and grasps the left orange juice bottle.",
      "actor": "Left robotic arm.",
      "object": "Left orange juice bottle.",
      "contact_event": "The robotic arm makes contact with the bottle and grasps it.",
      "state_change": "The bottle is lifted from the tray."
    },
    {
      "time": "2-4s",
      "description": "The left robotic arm lifts the left bottle out of the tray and places it on the blue surface.",
      "actor": "Left robotic arm.",
      "object": "Left orange juice bottle.",
      "contact_event": "The bottle is placed on the blue surface.",
      "state_change": "The bottle is now resting on the blue surface."
    },
    {
      "time": "4-6s",
      "description": "The right robotic arm reaches into the tray and grasps the right orange juice bottle.",
      "actor": "Right robotic arm.",
      "object": "Right orange juice bottle.",
      "contact_event": "The robotic arm makes contact with the bottle and grasps it.",
      "state_change": "The bottle is lifted from the tray."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Orange juice label",
      "category": "physical_in_scene",
      "appearance": "White label with orange slices and text, printed on the bottles.",
      "spatial": "On the bottles, visible throughout the video.",
      "spatial_temporal": "Static on the bottles.",
      "context": "Product label indicating the contents of the bottles."
    }
  ],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0-2s",
      "description": "The left robotic arm reaches into the tray and grasps the left orange juice bottle.",
      "key_changes": "The left robotic arm moves to grasp the bottle.",
      "camera": "Static top-down view.",
      "robot_motion": "Left robotic arm extends and grasps the bottle.",
      "object_motion": "The left bottle is lifted from the tray.",
      "physical_logic": "The robotic arm applies force to grasp and lift the bottle."
    },
    {
      "segment_index": 1,
      "time_range": "2-4s",
      "description": "The left robotic arm lifts the left bottle out of the tray and places it on the blue surface.",
      "key_changes": "The left bottle is moved from the tray to the blue surface.",
      "camera": "Static top-down view.",
      "robot_motion": "Left robotic arm lifts and places the bottle.",
      "object_motion": "The left bottle is moved from the tray to the blue surface.",
      "physical_logic": "The robotic arm moves the bottle to a new location."
    },
    {
      "segment_index": 2,
      "time_range": "4-6s",
      "description": "The right robotic arm reaches into the tray and grasps the right orange juice bottle.",
      "key_changes": "The right robotic arm moves to grasp the bottle.",
      "camera": "Static top-down view.",
      "robot_motion": "Right robotic arm extends and grasps the bottle.",
      "object_motion": "The right bottle is lifted from the tray.",
      "physical_logic": "The robotic arm applies force to grasp and lift the bottle."
    }
  ],
  "transitions": [
    "The left robotic arm completes its action, and the right robotic arm begins its action.",
    "The right robotic arm completes its action."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16,9",
  "duration": "6.0",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Two robotic arms with white and black bodies, green indicator lights, and black cables.",
    "end_effector": "Grippers on the robotic arms.",
    "manipulated_objects": "Two orange juice bottles.",
    "task_goal": "To remove the orange juice bottles from the tray.",
    "initial_state": "The bottles are inside the tray, and the robotic arms are positioned on either side.",
    "terminal_state": "The bottles are removed from the tray, with one on the blue surface and the other being held by the right robotic arm.",
    "affordances": "The robotic arms have the ability to grasp and lift the bottles.",
    "failure_modes": "None observed."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robotic arms successfully grasp the bottles and lift them out of the tray.",
    "support_and_stability": "The bottles are supported by the robotic arms and the tray.",
    "force_gravity_friction": "The robotic arms apply force to overcome gravity and friction to lift the bottles.",
    "causal_chain": "The robotic arms move to grasp the bottles, lift them, and place them in new locations.",
    "object_permanence": "The bottles remain in the scene, moving from the tray to the blue surface.",
    "next_action_prediction": "The right robotic arm will likely place the right bottle on the blue surface."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "Two robotic arms with green indicator lights are positioned on either side of a white plastic tray containing two orange juice bottles. The tray rests on a blue fabric surface. The left robot arm reaches into the tray, grasps one bottle, lifts it out, and places it on the blue surface. The right robot arm then reaches in, grasps the second bottle, and lifts it out of the tray. The camera remains stationary throughout, providing a top-down view of the action."
}

Cosmos 3 Super (64B) / p09: Reproduction schema

ok
p09: Reproduction schema · Cosmos 3 Super (64B) · 20.6s · 2026-06-10T16:09:43+0000

Short Read

Two robot arms manipulate orange juice bottles, removing them from a white tray and placing them onto a blue cloth surface.

Training Caption Field

Two robot arms, one on the left and one on the right, manipulate two orange juice bottles. Initially, the bottles are positioned in a white tray on a blue cloth surface. The robot arms approach the bottles, grasp them, lift them out of the tray, and place them onto the blue cloth surface. The video captures the entire process of removing the bottles from the tray and placing them on the cloth.

Full output JSON/text
{
  "schema_version": "p09_reproduction_v1",
  "one_sentence_summary": "Two robot arms manipulate orange juice bottles, removing them from a white tray and placing them onto a blue cloth surface.",
  "video_metadata": {
    "resolution": {
      "W": 1152,
      "H": 816
    },
    "aspect_ratio": "16,9",
    "duration_sec": 5.767,
    "fps": 30.0,
    "frame_count": 173
  },
  "camera": {
    "viewpoint": "top-down",
    "framing": "medium shot",
    "camera_angle": "high angle",
    "camera_motion": "static",
    "estimated_height_or_distance": "1.5m",
    "lens_or_distortion": "none",
    "crop_boundaries": "full frame",
    "stability": "stable"
  },
  "scene_layout": {
    "coordinate_frame": "2D",
    "background": "blue cloth surface",
    "surface": "blue cloth",
    "lighting": "bright",
    "global_spatial_map": "two robot arms on left and right, white tray in center, bottles in tray",
    "occlusions": "none"
  },
  "subjects": [
    {
      "id": "robot_arm_left",
      "category": "robot arm",
      "appearance": "white and black with green lights",
      "visible_parts": "arm, gripper",
      "initial_pose": "left side, reaching towards tray",
      "final_pose": "left side, holding bottle",
      "workspace_location": "left",
      "motion_path": "left to center",
      "end_effector": "gripper",
      "uncertainties": null
    },
    {
      "id": "robot_arm_right",
      "category": "robot arm",
      "appearance": "white and black with green lights",
      "visible_parts": "arm, gripper",
      "initial_pose": "right side, reaching towards tray",
      "final_pose": "right side, holding bottle",
      "workspace_location": "right",
      "motion_path": "right to center",
      "end_effector": "gripper",
      "uncertainties": null
    }
  ],
  "objects": [
    {
      "id": "white_tray",
      "category": "tray",
      "appearance": "white plastic",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "rectangular",
      "size_relative_to_frame": "medium",
      "initial_pose": "center",
      "final_pose": "center",
      "state_changes": "bottles removed",
      "uncertainties": null
    },
    {
      "id": "bottle_left",
      "category": "bottle",
      "appearance": "orange liquid, label",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "cylindrical",
      "size_relative_to_frame": "small",
      "initial_pose": "in tray",
      "final_pose": "on cloth",
      "state_changes": "moved from tray to cloth",
      "uncertainties": null
    },
    {
      "id": "bottle_right",
      "category": "bottle",
      "appearance": "orange liquid, label",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "cylindrical",
      "size_relative_to_frame": "small",
      "initial_pose": "in tray",
      "final_pose": "on cloth",
      "state_changes": "moved from tray to cloth",
      "uncertainties": null
    }
  ],
  "initial_state": "Two robot arms are positioned on either side of a white tray containing two orange juice bottles. The tray rests on a blue cloth surface.",
  "final_state": "Both robot arms have removed the bottles from the tray and placed them on the blue cloth surface. The tray is now empty.",
  "timeline": [
    {
      "time_range": "0-2s",
      "description": "Robot arms approach the bottles in the tray.",
      "active_subjects": [
        "robot_arm_left",
        "robot_arm_right"
      ],
      "object_states": {
        "bottle_left": "in tray",
        "bottle_right": "in tray"
      },
      "robot_motion": "approaching",
      "object_motion": "none",
      "gripper_state": "open"
    },
    {
      "time_range": "2-4s",
      "description": "Robot arms grasp the bottles.",
      "active_subjects": [
        "robot_arm_left",
        "robot_arm_right"
      ],
      "object_states": {
        "bottle_left": "grasped",
        "bottle_right": "grasped"
      },
      "robot_motion": "grasping",
      "object_motion": "none",
      "gripper_state": "closed"
    },
    {
      "time_range": "4-6s",
      "description": "Robot arms lift the bottles out of the tray and place them on the blue cloth.",
      "active_subjects": [
        "robot_arm_left",
        "robot_arm_right"
      ],
      "object_states": {
        "bottle_left": "on cloth",
        "bottle_right": "on cloth"
      },
      "robot_motion": "lifting, placing",
      "object_motion": "moving",
      "gripper_state": "closed"
    }
  ],
  "parallel_action_graph": {
    "simultaneous_actions": [
      {
        "action_1": "robot_arm_left grasps bottle_left",
        "action_2": "robot_arm_right grasps bottle_right"
      },
      {
        "action_1": "robot_arm_left lifts bottle_left",
        "action_2": "robot_arm_right lifts bottle_right"
      },
      {
        "action_1": "robot_arm_left places bottle_left on cloth",
        "action_2": "robot_arm_right places bottle_right on cloth"
      }
    ],
    "dependencies": [],
    "ordering_constraints": []
  },
  "keyframes": [
    {
      "time": 0,
      "frame_description": "Robot arms approach the bottles in the tray.",
      "subject_poses": {
        "robot_arm_left": "left side, reaching towards tray",
        "robot_arm_right": "right side, reaching towards tray"
      },
      "object_poses": {
        "bottle_left": "in tray",
        "bottle_right": "in tray"
      },
      "contacts": [],
      "important_visual_details": "none"
    },
    {
      "time": 2,
      "frame_description": "Robot arms grasp the bottles.",
      "subject_poses": {
        "robot_arm_left": "left side, grasping bottle_left",
        "robot_arm_right": "right side, grasping bottle_right"
      },
      "object_poses": {
        "bottle_left": "grasped",
        "bottle_right": "grasped"
      },
      "contacts": [
        {
          "subject_id": "robot_arm_left",
          "object_id": "bottle_left",
          "contact_type": "grasp"
        },
        {
          "subject_id": "robot_arm_right",
          "object_id": "bottle_right",
          "contact_type": "grasp"
        }
      ],
      "important_visual_details": "none"
    },
    {
      "time": 4,
      "frame_description": "Robot arms lift the bottles out of the tray.",
      "subject_poses": {
        "robot_arm_left": "left side, lifting bottle_left",
        "robot_arm_right": "right side, lifting bottle_right"
      },
      "object_poses": {
        "bottle_left": "being lifted",
        "bottle_right": "being lifted"
      },
      "contacts": [
        {
          "subject_id": "robot_arm_left",
          "object_id": "bottle_left",
          "contact_type": "grasp"
        },
        {
          "subject_id": "robot_arm_right",
          "object_id": "bottle_right",
          "contact_type": "grasp"
        }
      ],
      "important_visual_details": "none"
    },
    {
      "time": 6,
      "frame_description": "Robot arms place the bottles on the blue cloth.",
      "subject_poses": {
        "robot_arm_left": "left side, placing bottle_left",
        "robot_arm_right": "right side, placing bottle_right"
      },
      "object_poses": {
        "bottle_left": "on cloth",
        "bottle_right": "on cloth"
      },
      "contacts": [
        {
          "subject_id": "robot_arm_left",
          "object_id": "bottle_left",
          "contact_type": "place"
        },
        {
          "subject_id": "robot_arm_right",
          "object_id": "bottle_right",
          "contact_type": "place"
        }
      ],
      "important_visual_details": "none"
    }
  ],
  "contact_events": [
    {
      "time": 2,
      "subject_id": "robot_arm_left",
      "object_id": "bottle_left",
      "contact_type": "grasp",
      "contact_location": "bottle_left",
      "grasp_or_support_quality": "secure",
      "release_or_transfer": "none",
      "visible_evidence": "none"
    },
    {
      "time": 2,
      "subject_id": "robot_arm_right",
      "object_id": "bottle_right",
      "contact_type": "grasp",
      "contact_location": "bottle_right",
      "grasp_or_support_quality": "secure",
      "release_or_transfer": "none",
      "visible_evidence": "none"
    },
    {
      "time": 4,
      "subject_id": "robot_arm_left",
      "object_id": "bottle_left",
      "contact_type": "lift",
      "contact_location": "bottle_left",
      "grasp_or_support_quality": "secure",
      "release_or_transfer": "none",
      "visible_evidence": "none"
    },
    {
      "time": 4,
      "subject_id": "robot_arm_right",
      "object_id": "bottle_right",
      "contact_type": "lift",
      "contact_location": "bottle_right",
      "grasp_or_support_quality": "secure",
      "release_or_transfer": "none",
      "visible_evidence": "none"
    },
    {
      "time": 6,
      "subject_id": "robot_arm_left",
      "object_id": "bottle_left",
      "contact_type": "place",
      "contact_location": "blue cloth",
      "grasp_or_support_quality": "secure",
      "release_or_transfer": "none",
      "visible_evidence": "none"
    },
    {
      "time": 6,
      "subject_id": "robot_arm_right",
      "object_id": "bottle_right",
      "contact_type": "place",
      "contact_location": "blue cloth",
      "grasp_or_support_quality": "secure",
      "release_or_transfer": "none",
      "visible_evidence": "none"
    }
  ],
  "physics_and_affordances": {
    "support_surfaces": [
      "blue cloth",
      "white tray"
    ],
    "gravity_and_stability": "stable",
    "force_or_constraint_reasoning": "none",
    "collision_or_near_collision": "none",
    "object_permanence": "none",
    "task_goal": "remove bottles from tray",
    "failure_modes": [],
    "next_action_prediction": "none"
  },
  "appearance_fidelity": {
    "colors": [
      "white",
      "black",
      "green",
      "orange",
      "blue"
    ],
    "textures": [
      "plastic",
      "cloth"
    ],
    "lighting_shadows": "bright",
    "robot_morphology": "white and black with green lights",
    "object_fine_details": "none",
    "background_fine_details": "none"
  },
  "text_and_labels": [
    {
      "object_id": "bottle_left",
      "raw_text": "none",
      "legibility": "illegible",
      "appearance": "label",
      "spatial_location": "bottle_left",
      "time_visible": "0-6s"
    },
    {
      "object_id": "bottle_right",
      "raw_text": "none",
      "legibility": "illegible",
      "appearance": "label",
      "spatial_location": "bottle_right",
      "time_visible": "0-6s"
    }
  ],
  "audio_description": "none",
  "generation_controls": {
    "shot_type": "medium shot",
    "camera_motion_instruction": "static",
    "motion_style": "none",
    "duration_instruction": "5.767s",
    "spatial_consistency_requirements": "none",
    "negative_constraints": [],
    "recommended_caption_style": "none"
  },
  "uncertainties": [],
  "comprehensive_reproduction_caption": "Two robot arms, one on the left and one on the right, manipulate two orange juice bottles. Initially, the bottles are positioned in a white tray on a blue cloth surface. The robot arms approach the bottles, grasp them, lift them out of the tray, and place them onto the blue cloth surface. The video captures the entire process of removing the bottles from the tray and placing them on the cloth."
}

rovid_001

top

Qwen 3.5 9B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 9B · 105.2s · 2026-06-10T17:42:14+0000

Short Read

At the beginning, a robotic arm approaches a black cloth lying on a blue mat next to a black rectangular object. A human hand in a blue glove holds the object steady. In the middle, the robot's gripper, wrapped in black fabric, grasps the cloth and lifts it slightly, attempting to cover or interact with the black object underneath. By the end, the robot continues to manipulate the cloth over the object, adjusting its position while the human hand maintains its grip on the object.

Training Caption Field

In a laboratory setting, a robotic arm with a white and black gripper approaches a black cloth lying on a blue mat. A human hand wearing a blue glove holds a black rectangular object steady on the mat. The robot grasps the cloth and lifts it, attempting to cover or interact with the black object underneath. The robot continues to manipulate the cloth, adjusting its position and bunching it up over the object, while the human hand maintains its grip on the object. The scene is brightly lit, with various other objects like a Rubik's cube and building blocks visible on the table to the right.

Full output JSON/text
{
  "scene_imagination": "A high-angle, static shot captures a white table in a laboratory or testing environment. On the left, a blue felt mat holds a black rectangular object (possibly a battery or electronic component) and a black cloth. A robotic arm with a white and black gripper is positioned on the left, interacting with the cloth. A human hand wearing a blue glove is visible, holding the black object steady. To the right, various items are arranged, including a red crate with a blue ball, a Rubik's cube, and a stack of large, colorful building blocks (red and blue). The background shows a grey floor with a blue line and the legs of people standing nearby.",
  "temporal_caption": "At the beginning, a robotic arm approaches a black cloth lying on a blue mat next to a black rectangular object. A human hand in a blue glove holds the object steady. In the middle, the robot's gripper, wrapped in black fabric, grasps the cloth and lifts it slightly, attempting to cover or interact with the black object underneath. By the end, the robot continues to manipulate the cloth over the object, adjusting its position while the human hand maintains its grip on the object.",
  "audio_description": null,
  "subjects": [
    {
      "description": "Robotic arm with gripper",
      "appearance_details": "Black articulated arm with a white and black end-effector/gripper. The gripper appears to be wrapped in a black cloth or soft material.",
      "relationship": "Interacting with the black cloth and the object on the mat.",
      "location": "Left side of the table, on the blue mat.",
      "relative_size": "Large compared to the objects on the mat.",
      "orientation": "Angled towards the center of the mat.",
      "pose": "Extended, with the gripper hovering over and then touching the cloth.",
      "action": "Grasping and manipulating the black cloth.",
      "state_changes": "Moves from a resting position to actively grasping and lifting the cloth."
    },
    {
      "description": "Human hand",
      "appearance_details": "Wearing a light blue disposable glove.",
      "relationship": "Holding the black rectangular object steady.",
      "location": "Center-left, above the blue mat.",
      "relative_size": "Standard human hand size.",
      "orientation": "Vertical, fingers pointing down.",
      "pose": "Stable, holding the object.",
      "action": "Holding the black object.",
      "state_changes": "Remains relatively static throughout the clip."
    },
    {
      "description": "Black rectangular object",
      "appearance_details": "Shiny, black, rectangular, possibly a battery or electronic component with a cable attached.",
      "relationship": "Being held by the human hand and partially covered by the robot's cloth.",
      "location": "On the blue mat.",
      "relative_size": "Medium-sized, roughly the size of a smartphone or small battery pack.",
      "orientation": "Horizontal.",
      "pose": "Stationary.",
      "action": "Being held and partially obscured.",
      "state_changes": "Position remains constant, but its visibility changes as the cloth moves."
    },
    {
      "description": "Black cloth",
      "appearance_details": "Soft, black fabric.",
      "relationship": "Being manipulated by the robot.",
      "location": "On the blue mat, over the black object.",
      "relative_size": "Small, roughly the size of the object.",
      "orientation": "Lying flat, then lifted.",
      "pose": "Flat, then bunched up as it is lifted.",
      "action": "Being grasped and lifted by the robot.",
      "state_changes": "Moves from a flat state to being lifted and manipulated by the gripper."
    }
  ],
  "background_setting": "An indoor setting, likely a robotics lab or testing facility. The floor is grey with a blue tape line. Other people's legs and shoes are visible in the background, suggesting a collaborative or demonstration environment.",
  "lighting": {
    "conditions": "Bright, even indoor lighting.",
    "direction": "Overhead, casting soft shadows.",
    "shadows": "Soft shadows are visible under the objects on the table.",
    "illumination_effect": "Clear visibility of all objects and actions without harsh glare."
  },
  "aesthetics": {
    "composition": "High-angle, wide shot capturing the entire workspace.",
    "color_scheme": "Neutral tones (white, grey, black) with pops of color from the objects (blue, red, green).",
    "mood_atmosphere": "Clinical, focused, and experimental.",
    "patterns": "None significant, except for the grid-like structure of the table legs and the Rubik's cube."
  },
  "cinematography": {
    "framing": "Wide shot, capturing the full table and surrounding area.",
    "camera_angle": "High angle, looking down at the table.",
    "depth_of_field": "Deep focus, keeping most of the scene in sharp focus.",
    "focus": "Sharp focus on the table surface and the objects.",
    "lens_focal_length": "Unknown, likely a standard or slightly wide lens for the high-angle view.",
    "camera_motion": "Static, no camera movement."
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "A demonstration or test of a robot's ability to manipulate soft objects (the cloth) while a human assists by holding a rigid object (the black component). This could be part of a dexterity or grasping task evaluation.",
  "actions": [
    {
      "time": "00:00 - 00:02",
      "description": "The robotic arm approaches the black cloth on the blue mat. The human hand is already holding the black object.",
      "actor": "Robotic arm",
      "object": "Black cloth",
      "contact_event": "The gripper makes contact with the cloth.",
      "state_change": "The cloth begins to be lifted.",
      "evidence": "The cloth moves upwards as the gripper closes around it."
    },
    {
      "time": "00:02 - 00:08",
      "description": "The robot lifts the cloth, attempting to cover or interact with the black object underneath.",
      "actor": "Robotic arm",
      "object": "Black cloth and black object",
      "contact_event": "The cloth is lifted and manipulated over the object.",
      "state_change": "The cloth is bunched up and moved around the object.",
      "evidence": "The cloth's position changes significantly as the robot adjusts its grip."
    },
    {
      "time": "00:08 - 00:12",
      "description": "The robot continues to manipulate the cloth, trying to stabilize or cover the object.",
      "actor": "Robotic arm",
      "object": "Black cloth",
      "contact_event": "Continuous contact and manipulation.",
      "state_change": "The cloth remains in a bunched state over the object.",
      "evidence": "The robot's gripper remains engaged with the cloth, making small adjustments."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:02",
      "description": "Initial approach and contact with the cloth.",
      "key_changes": "Robot arm moves into position and touches the cloth.",
      "camera": "Static.",
      "robot_motion": "Approach and initial grasp.",
      "object_motion": "Cloth begins to lift.",
      "physical_logic": "The robot identifies the cloth as a target for manipulation."
    },
    {
      "segment_index": 2,
      "time_range": "00:02 - 00:08",
      "description": "Active manipulation of the cloth over the object.",
      "key_changes": "The cloth is lifted and moved around the object.",
      "camera": "Static.",
      "robot_motion": "Lifting and adjusting the cloth.",
      "object_motion": "Cloth is bunched and moved.",
      "physical_logic": "The robot attempts to grasp or cover the object using the cloth."
    },
    {
      "segment_index": 3,
      "time_range": "00:08 - 00:12",
      "description": "Continued manipulation and stabilization.",
      "key_changes": "The robot makes fine adjustments to the cloth.",
      "camera": "Static.",
      "robot_motion": "Fine-tuning the grasp.",
      "object_motion": "Minimal motion, mostly stabilization.",
      "physical_logic": "The robot tries to secure the cloth over the object."
    }
  ],
  "transitions": [
    "Approach to contact",
    "Lifting and manipulating",
    "Stabilizing and adjusting"
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16:9",
  "duration": "12.433s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Robotic arm with a custom gripper.",
    "end_effector": "White and black gripper, possibly wrapped in cloth.",
    "manipulated_objects": "Black cloth, black rectangular object.",
    "task_goal": "To manipulate the cloth over the black object, possibly to cover it or perform a specific task.",
    "initial_state": "Robot arm is stationary, cloth is flat on the mat.",
    "terminal_state": "Robot arm is holding the cloth over the object, human hand is still holding the object.",
    "affordances": "The cloth provides a soft, deformable surface for the robot to grasp.",
    "failure_modes": "The robot might struggle to grasp the cloth securely or might not be able to cover the object completely."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robot uses its gripper to grasp the cloth, which is a soft and deformable object.",
    "support_and_stability": "The human hand provides stability to the black object, preventing it from moving while the robot manipulates the cloth.",
    "force_gravity_friction": "Gravity pulls the cloth down, while the robot's grip and the human's hold counteract this force.",
    "causal_chain": "The robot's action of grasping the cloth causes it to lift and move, which in turn affects the visibility and position of the black object.",
    "object_permanence": "The black object and the cloth remain in the scene throughout the video.",
    "next_action_prediction": "The robot might continue to manipulate the cloth, possibly trying to fold it or secure it around the object."
  },
  "uncertainties": [
    "The exact purpose of the cloth manipulation is unclear.",
    "The specific task goal of the robot is not explicitly stated."
  ],
  "comprehensive_t2v_caption": "In a laboratory setting, a robotic arm with a white and black gripper approaches a black cloth lying on a blue mat. A human hand wearing a blue glove holds a black rectangular object steady on the mat. The robot grasps the cloth and lifts it, attempting to cover or interact with the black object underneath. The robot continues to manipulate the cloth, adjusting its position and bunching it up over the object, while the human hand maintains its grip on the object. The scene is brightly lit, with various other objects like a Rubik's cube and building blocks visible on the table to the right."
}

Qwen 3.5 27B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 27B · 173.8s · 2026-06-10T17:59:09+0000

Short Read

At the beginning, a human hand in a blue glove is positioned over a black object on a grey mat. A robotic arm is visible on the left. In the early phase, the robotic arm moves its gripper towards the black object. In the middle section, the gripper makes contact with the object and begins to manipulate it, lifting and rotating it slightly. Towards the end, the robot continues to adjust its grip and position on the object while the human hand remains stationary.

Training Caption Field

A high-angle static shot captures a white laboratory table where a black robotic arm with a white and black gripper interacts with a black rectangular object on a grey mat. A human hand in a blue glove is also present, observing the action. The robot arm approaches the object, successfully grasps it with its gripper, and then lifts and slightly rotates it. The table is cluttered with various items including a purple bowl, an orange crate with a blue ball, stacked blocks with cups, and a Rubik's cube, suggesting a complex manipulation task in a research setting. The lighting is bright and even, ensuring clear visibility of the robot's precise movements and the object's state changes.

Full output JSON/text
{
  "scene_imagination": "A high-angle, static shot captures a white laboratory table cluttered with various objects. On the left, a grey mat holds a black rectangular object, a purple bowl with white clips, and a bag of screws. A human hand wearing a blue nitrile glove rests near the black object. A black robotic arm with a white and black gripper enters from the left. On the right side of the table, there is an orange crate with a blue ball and a small robot, a stack of grey blocks with blue and red cups, a Rubik's cube, and a disassembled electronic device. The background shows a grey floor with a blue tape line and the feet of people standing nearby.",
  "temporal_caption": "At the beginning, a human hand in a blue glove is positioned over a black object on a grey mat. A robotic arm is visible on the left. In the early phase, the robotic arm moves its gripper towards the black object. In the middle section, the gripper makes contact with the object and begins to manipulate it, lifting and rotating it slightly. Towards the end, the robot continues to adjust its grip and position on the object while the human hand remains stationary.",
  "audio_description": null,
  "subjects": [
    {
      "description": "A robotic arm with a black body and a white and black gripper.",
      "appearance_details": "The arm is black with visible joints and wiring. The gripper has a white central part and black pincer-like fingers.",
      "relationship": "Interacting with the black object on the table.",
      "location": "Enters from the left side of the frame.",
      "relative_size": "Medium-sized, comparable to the objects on the table.",
      "orientation": "Oriented towards the center of the table.",
      "pose": "Extended arm with gripper open, then closing on an object.",
      "action": "Approaching, grasping, and manipulating a black object.",
      "state_changes": "Moves from a resting position to an active manipulation state."
    },
    {
      "description": "A human hand wearing a blue nitrile glove.",
      "appearance_details": "The hand is covered by a light blue, form-fitting glove.",
      "relationship": "Observing or supervising the robot's action.",
      "location": "Positioned over the grey mat on the left side of the table.",
      "relative_size": "Standard human hand size.",
      "orientation": "Fingers pointing down towards the black object.",
      "pose": "Static, with fingers slightly curled.",
      "action": "Remaining stationary near the object.",
      "state_changes": "None."
    },
    {
      "description": "A black rectangular object.",
      "appearance_details": "The object is black, glossy, and appears to be a casing or a tray with some internal components visible.",
      "relationship": "The primary object being manipulated by the robot.",
      "location": "Centered on the grey mat.",
      "relative_size": "Medium-sized, fitting within the robot's gripper.",
      "orientation": "Lying flat on the mat.",
      "pose": "Stationary until manipulated.",
      "action": "Being grasped and moved by the robot.",
      "state_changes": "Changes position and orientation due to the robot's manipulation."
    }
  ],
  "background_setting": "A laboratory or workshop environment with a white table, grey flooring, and various technical equipment and objects scattered around.",
  "lighting": {
    "conditions": "Bright, even indoor lighting.",
    "direction": "Overhead, casting minimal shadows.",
    "shadows": "Soft shadows directly beneath objects.",
    "illumination_effect": "Clear visibility of all objects and actions without harsh contrasts."
  },
  "aesthetics": {
    "composition": "High-angle shot with a cluttered but organized tabletop layout.",
    "color_scheme": "Neutral tones (white, grey, black) with pops of color from objects (blue, red, orange).",
    "mood_atmosphere": "Clinical, technical, and focused.",
    "patterns": "No distinct patterns; functional arrangement of items."
  },
  "cinematography": {
    "framing": "Medium shot capturing the entire tabletop workspace.",
    "camera_angle": "High-angle, looking down at the table.",
    "depth_of_field": "Deep, keeping most of the table in focus.",
    "focus": "Sharp focus on the central area of interaction.",
    "lens_focal_length": "Unknown, likely a standard or slightly wide lens.",
    "camera_motion": "Static camera with no movement."
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "Inferred to be a demonstration or test of a robot's ability to grasp and manipulate a specific object in a cluttered environment, possibly for a research or educational purpose.",
  "actions": [
    {
      "time": "00:00 - 00:02",
      "description": "The robotic arm approaches the black object on the grey mat.",
      "actor": "Robotic arm",
      "object": "Black rectangular object",
      "contact_event": "No contact yet.",
      "state_change": "Robot moves into position.",
      "evidence": "Visual movement of the arm towards the object."
    },
    {
      "time": "00:03 - 00:06",
      "description": "The robot's gripper closes around the black object.",
      "actor": "Robotic arm",
      "object": "Black rectangular object",
      "contact_event": "Gripper makes contact and grasps the object.",
      "state_change": "Object is secured by the gripper.",
      "evidence": "Gripper fingers close around the object."
    },
    {
      "time": "00:07 - 00:12",
      "description": "The robot lifts and slightly rotates the black object.",
      "actor": "Robotic arm",
      "object": "Black rectangular object",
      "contact_event": "Maintained grasp.",
      "state_change": "Object's position and orientation change.",
      "evidence": "Object is visibly moved and tilted by the robot."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:02",
      "description": "Initial setup with the robot arm approaching the target object.",
      "key_changes": "Robot arm enters the frame and moves towards the object.",
      "camera": "Static.",
      "robot_motion": "Approach.",
      "object_motion": "None.",
      "physical_logic": "Robot positions itself for grasping."
    },
    {
      "segment_index": 2,
      "time_range": "00:03 - 00:06",
      "description": "The robot successfully grasps the black object.",
      "key_changes": "Gripper closes on the object.",
      "camera": "Static.",
      "robot_motion": "Grasping.",
      "object_motion": "None.",
      "physical_logic": "Contact and secure hold established."
    },
    {
      "segment_index": 3,
      "time_range": "00:07 - 00:12",
      "description": "The robot manipulates the object by lifting and rotating it.",
      "key_changes": "Object is moved and reoriented.",
      "camera": "Static.",
      "robot_motion": "Manipulation.",
      "object_motion": "Lifted and rotated.",
      "physical_logic": "Force applied to change object's state."
    }
  ],
  "transitions": [
    "Robot approaches object.",
    "Robot grasps object.",
    "Robot manipulates object."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16:9",
  "duration": "12.433s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Black robotic arm with a white and black gripper.",
    "end_effector": "Two-fingered gripper.",
    "manipulated_objects": "Black rectangular object.",
    "task_goal": "Grasp and manipulate the black object.",
    "initial_state": "Object stationary on the mat, robot arm approaching.",
    "terminal_state": "Object held and slightly reoriented by the robot.",
    "affordances": "Gripping, lifting, rotating.",
    "failure_modes": "Potential slippage or incorrect grasp orientation."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper successfully makes contact and secures a hold on the object.",
    "support_and_stability": "The object is supported by the gripper after being lifted from the mat.",
    "force_gravity_friction": "The robot applies sufficient force to overcome gravity and friction to lift and move the object.",
    "causal_chain": "Robot movement leads to contact, which leads to grasping, which leads to object manipulation.",
    "object_permanence": "The object remains visible and consistent throughout the interaction.",
    "next_action_prediction": "The robot may continue to manipulate the object or place it elsewhere."
  },
  "uncertainties": [
    "The exact nature of the black object is unclear.",
    "The purpose of the human hand's presence is not explicitly clear."
  ],
  "comprehensive_t2v_caption": "A high-angle static shot captures a white laboratory table where a black robotic arm with a white and black gripper interacts with a black rectangular object on a grey mat. A human hand in a blue glove is also present, observing the action. The robot arm approaches the object, successfully grasps it with its gripper, and then lifts and slightly rotates it. The table is cluttered with various items including a purple bowl, an orange crate with a blue ball, stacked blocks with cups, and a Rubik's cube, suggesting a complex manipulation task in a research setting. The lighting is bright and even, ensuring clear visibility of the robot's precise movements and the object's state changes."
}

Cosmos 3 Nano (16B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Nano (16B) · 14.4s · 2026-06-10T18:35:36+0000

Short Read

The video begins with a static top-down view of a table with various objects. A person wearing blue gloves is seen holding a black object on a blue mat. A robotic arm with a black and white gripper enters from the left and begins to interact with the black object. The robotic arm moves the black object slightly, adjusting its position on the mat. The person continues to hold the object steady while the robotic arm manipulates it. The scene remains focused on the interaction between the robotic arm and the black object throughout the video.

Training Caption Field

A top-down view of a white table with a blue mat on the left side and various objects on the right side. A person wearing blue gloves is seen holding a black object on the blue mat. A robotic arm with a black and white gripper enters from the left and begins to interact with the black object. The robotic arm moves the black object slightly, adjusting its position on the mat. The person continues to hold the object steady while the robotic arm manipulates it. The scene remains focused on the interaction between the robotic arm and the black object throughout the video.

Full output JSON/text
{
  "scene_imagination": "A top-down view of a white table with a blue mat on the left side and various objects on the right. A person wearing blue gloves is manipulating a black object on the blue mat, while a robotic arm with a black and white gripper enters from the left and interacts with the object. The table holds a Rubik's cube, blue cups, red blocks, an orange box with a blue ball, a pink bowl, a white plastic bag, and an electronic device with wires. The background shows a gray floor with blue tape markings and the lower legs of people standing around.",
  "temporal_caption": "The video begins with a static top-down view of a table with various objects. A person wearing blue gloves is seen holding a black object on a blue mat. A robotic arm with a black and white gripper enters from the left and begins to interact with the black object. The robotic arm moves the black object slightly, adjusting its position on the mat. The person continues to hold the object steady while the robotic arm manipulates it. The scene remains focused on the interaction between the robotic arm and the black object throughout the video.",
  "audio_description": "The video does not provide any audio information.",
  "subjects": [
    {
      "description": "A person wearing blue gloves, manipulating a black object on a blue mat.",
      "appearance_details": "Blue gloves, partially visible arms and torso.",
      "relationship": "Interacting with the black object and the robotic arm.",
      "location": "Center-left of the frame.",
      "relative_size": "Medium",
      "orientation": "Facing the table.",
      "pose": "Standing, arms extended towards the table.",
      "action": "Holding and adjusting a black object on the blue mat.",
      "state_changes": "The person's hands move slightly as they adjust the black object.",
      "clothing": "Blue gloves, partially visible light-colored shirt and khaki pants.",
      "expression": "Not visible.",
      "gender": "Unknown",
      "age": "Unknown",
      "skin_tone_and_texture": "Not visible.",
      "number_of_subjects": 1,
      "number_of_arms": 2,
      "number_of_legs": 2
    },
    {
      "description": "A robotic arm with a black and white gripper.",
      "appearance_details": "Black and white mechanical arm with a gripper at the end.",
      "relationship": "Interacting with the black object held by the person.",
      "location": "Left side of the frame.",
      "relative_size": "Medium",
      "orientation": "Facing the table.",
      "pose": "Extended towards the table.",
      "action": "Moving towards the black object and adjusting its position.",
      "state_changes": "The robotic arm enters the frame and begins to manipulate the black object.",
      "clothing": "None",
      "expression": "None",
      "gender": "None",
      "age": "None",
      "skin_tone_and_texture": "None",
      "number_of_subjects": 1,
      "number_of_arms": 0,
      "number_of_legs": 0
    }
  ],
  "background_setting": "A white table with a blue mat on the left side and various objects on the right side. The table is surrounded by a gray floor with blue tape markings. The lower legs of people are visible in the background.",
  "lighting": {
    "conditions": "Bright indoor lighting.",
    "direction": "Top-lit.",
    "shadows": "Soft shadows on the table and floor.",
    "illumination_effect": "Even illumination across the scene."
  },
  "aesthetics": {
    "composition": "Centered on the table and the interaction between the person and the robotic arm.",
    "color_scheme": "Neutral tones with pops of color from the objects on the table.",
    "mood_atmosphere": "Focused and technical.",
    "patterns": "None"
  },
  "cinematography": {
    "camera_angle": "Top-down view.",
    "framing": "Wide shot.",
    "depth_of_field": "Deep.",
    "focus": "Sharp focus on the table and objects.",
    "lens_focal_length": "Standard.",
    "camera_motion": "None"
  },
  "style_medium": "Robotics video",
  "artistic_style": "Real-world robot manipulation footage",
  "context": "A demonstration of a robotic arm interacting with a person in a controlled environment.",
  "actions": [
    {
      "time": "0:00-0:03",
      "description": "The person holds a black object on the blue mat.",
      "actor": "Person",
      "object": "Black object",
      "contact_event": "The person's hands are in contact with the black object.",
      "state_change": "The person adjusts the position of the black object.",
      "evidence": "The person's hands move slightly."
    },
    {
      "time": "0:03-0:06",
      "description": "The robotic arm enters the frame and begins to interact with the black object.",
      "actor": "Robotic arm",
      "object": "Black object",
      "contact_event": "The robotic arm's gripper makes contact with the black object.",
      "state_change": "The robotic arm manipulates the black object.",
      "evidence": "The robotic arm moves towards the black object."
    },
    {
      "time": "0:06-0:12",
      "description": "The robotic arm continues to adjust the position of the black object while the person holds it steady.",
      "actor": "Robotic arm",
      "object": "Black object",
      "contact_event": "The robotic arm's gripper remains in contact with the black object.",
      "state_change": "The black object is adjusted slightly.",
      "evidence": "The robotic arm moves the black object."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0:00-0:03",
      "description": "The person holds a black object on the blue mat.",
      "key_changes": "None",
      "camera": "Static top-down view.",
      "robot_motion": "None",
      "object_motion": "None",
      "physical_logic": "The person is preparing to interact with the black object."
    },
    {
      "segment_index": 1,
      "time_range": "0:03-0:06",
      "description": "The robotic arm enters the frame and begins to interact with the black object.",
      "key_changes": "The robotic arm enters the frame.",
      "camera": "Static top-down view.",
      "robot_motion": "The robotic arm moves towards the black object.",
      "object_motion": "The robotic arm's gripper makes contact with the black object.",
      "physical_logic": "The robotic arm begins to manipulate the black object."
    },
    {
      "segment_index": 2,
      "time_range": "0:06-0:12",
      "description": "The robotic arm continues to adjust the position of the black object while the person holds it steady.",
      "key_changes": "The black object is adjusted slightly.",
      "camera": "Static top-down view.",
      "robot_motion": "The robotic arm moves the black object.",
      "object_motion": "The black object is adjusted.",
      "physical_logic": "The robotic arm continues to manipulate the black object."
    }
  ],
  "transitions": [],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "12.433s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Robotic arm with a black and white gripper.",
    "end_effector": "Black and white gripper.",
    "manipulated_objects": "Black object.",
    "task_goal": "Manipulate the black object.",
    "initial_state": "The robotic arm is not in the frame.",
    "terminal_state": "The robotic arm is interacting with the black object.",
    "affordances": "The black object can be grasped and moved.",
    "failure_modes": "The robotic arm may fail to grasp the black object or may move it in an unintended way."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robotic arm's gripper makes contact with the black object and adjusts its position.",
    "support_and_stability": "The black object is held steady by the person while the robotic arm manipulates it.",
    "force_and_motion": "The robotic arm applies force to move the black object.",
    "causal_chain": "The robotic arm enters the frame and begins to manipulate the black object.",
    "object_permanence": "The black object remains in the scene throughout the video.",
    "next_action_prediction": "The robotic arm will continue to adjust the position of the black object."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A top-down view of a white table with a blue mat on the left side and various objects on the right side. A person wearing blue gloves is seen holding a black object on the blue mat. A robotic arm with a black and white gripper enters from the left and begins to interact with the black object. The robotic arm moves the black object slightly, adjusting its position on the mat. The person continues to hold the object steady while the robotic arm manipulates it. The scene remains focused on the interaction between the robotic arm and the black object throughout the video."
}

Cosmos 3 Super (64B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Super (64B) · 22.8s · 2026-06-10T16:10:05+0000

Short Read

The video starts with a top-down view of a table with various objects. A person wearing blue gloves is seen interacting with a black rectangular object on the left side of the table. A robotic arm with a black and white gripper is positioned on the left side of the table, moving towards the black rectangular object. The robotic arm then picks up the black rectangular object and moves it to the right side of the table. The person wearing blue gloves then places the black rectangular object back on the left side of the table. The robotic arm then picks up the black rectangular object again and moves it to the right side of the table. The video ends with the black rectangular object being placed on the right side of the table.

Training Caption Field

A top-down view of a white table with various objects arranged on it. On the left side, there is a blue mat with a black rectangular object, a black cloth, and a white brush. A person wearing blue gloves is interacting with the black rectangular object. On the right side, there are several colorful objects including a Rubik's cube, a blue ball, a red container, and some electronic components. A robotic arm with a black and white gripper is positioned on the left side of the table, moving towards the black rectangular object. The robotic arm then picks up the black rectangular object and moves it to the right side of the table. The person wearing blue gloves then places the black rectangular object back on the left side of the table. The robotic arm then picks up the black rectangular object again and moves it to the right side of the table. The video ends with the black rectangular object being placed on the right side of the table.

Full output JSON/text
{
  "scene_imagination": "A top-down view of a white table with various objects arranged on it. On the left side, there is a blue mat with a black rectangular object, a black cloth, and a white brush. A person wearing blue gloves is interacting with the black rectangular object. On the right side, there are several colorful objects including a Rubik's cube, a blue ball, a red container, and some electronic components. A robotic arm with a black and white gripper is positioned on the left side of the table, moving towards the black rectangular object.",
  "temporal_caption": "The video starts with a top-down view of a table with various objects. A person wearing blue gloves is seen interacting with a black rectangular object on the left side of the table. A robotic arm with a black and white gripper is positioned on the left side of the table, moving towards the black rectangular object. The robotic arm then picks up the black rectangular object and moves it to the right side of the table. The person wearing blue gloves then places the black rectangular object back on the left side of the table. The robotic arm then picks up the black rectangular object again and moves it to the right side of the table. The video ends with the black rectangular object being placed on the right side of the table.",
  "audio_description": "The video does not contain any audio.",
  "subjects": [
    {
      "description": "A person wearing blue gloves and a white shirt.",
      "appearance_details": "The person is wearing blue gloves and a white shirt.",
      "relationship": "The person is interacting with the black rectangular object.",
      "location": "Left side of the table.",
      "relative_size": "Medium.",
      "orientation": "Facing the table.",
      "pose": "Standing and leaning over the table.",
      "action": "The person is interacting with the black rectangular object.",
      "state_changes": "The person moves their hand to interact with the black rectangular object.",
      "clothing": "White shirt and blue gloves.",
      "expression": "Not visible.",
      "gender": "Unknown.",
      "age": "Unknown.",
      "skin_tone_and_texture": "Not visible.",
      "number_of_arms": 2,
      "number_of_legs": 0
    },
    {
      "description": "A robotic arm with a black and white gripper.",
      "appearance_details": "The robotic arm is black and white with a gripper at the end.",
      "relationship": "The robotic arm is interacting with the black rectangular object.",
      "location": "Left side of the table.",
      "relative_size": "Medium.",
      "orientation": "Facing the table.",
      "pose": "The robotic arm is positioned on the left side of the table.",
      "action": "The robotic arm is moving towards the black rectangular object.",
      "state_changes": "The robotic arm moves towards the black rectangular object and picks it up."
    }
  ],
  "background_setting": "The video is set in a room with a white table. On the table, there are various objects including a blue mat, a black rectangular object, a black cloth, a white brush, a Rubik's cube, a blue ball, a red container, and some electronic components.",
  "lighting": {
    "conditions": "Bright indoor lighting.",
    "direction": "Top-down lighting.",
    "shadows": "Soft shadows on the table.",
    "illumination_effect": "Even illumination across the table."
  },
  "aesthetics": {
    "composition": "Top-down view of the table with objects arranged on it.",
    "color_scheme": "A mix of colors including blue, black, white, red, and green.",
    "mood_atmosphere": "Neutral and focused on the task.",
    "patterns": "No specific patterns visible."
  },
  "cinematography": {
    "camera_motion": "Static camera.",
    "framing": "Top-down view of the table.",
    "camera_angle": "Bird's-eye view.",
    "depth_of_field": "Deep depth of field.",
    "focus": "Sharp focus on the table and objects.",
    "lens_focal_length": "Standard lens."
  },
  "style_medium": "Real-world robot manipulation footage.",
  "artistic_style": "Realistic and functional.",
  "context": "The video shows a robot arm interacting with a black rectangular object on a table. The robot arm is moving the object from one side of the table to the other.",
  "actions": [
    {
      "time": "0-2s",
      "description": "The person wearing blue gloves is seen interacting with the black rectangular object.",
      "actor": "Person wearing blue gloves.",
      "object": "Black rectangular object.",
      "contact_event": "The person's hand touches the black rectangular object.",
      "state_changes": "The person's hand moves to interact with the black rectangular object."
    },
    {
      "time": "2-5s",
      "description": "The robotic arm moves towards the black rectangular object.",
      "actor": "Robotic arm.",
      "object": "Black rectangular object.",
      "contact_event": "The robotic arm's gripper touches the black rectangular object.",
      "state_changes": "The robotic arm moves towards the black rectangular object."
    },
    {
      "time": "5-8s",
      "description": "The robotic arm picks up the black rectangular object.",
      "actor": "Robotic arm.",
      "object": "Black rectangular object.",
      "contact_event": "The robotic arm's gripper grasps the black rectangular object.",
      "state_changes": "The robotic arm picks up the black rectangular object."
    },
    {
      "time": "8-10s",
      "description": "The person wearing blue gloves places the black rectangular object back on the left side of the table.",
      "actor": "Person wearing blue gloves.",
      "object": "Black rectangular object.",
      "contact_event": "The person's hand places the black rectangular object on the table.",
      "state_changes": "The black rectangular object is placed back on the left side of the table."
    },
    {
      "time": "10-12s",
      "description": "The robotic arm picks up the black rectangular object again and moves it to the right side of the table.",
      "actor": "Robotic arm.",
      "object": "Black rectangular object.",
      "contact_event": "The robotic arm's gripper grasps the black rectangular object.",
      "state_changes": "The robotic arm picks up the black rectangular object and moves it to the right side of the table."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0-2s",
      "description": "The person wearing blue gloves is seen interacting with the black rectangular object.",
      "key_changes": "The person's hand moves to interact with the black rectangular object.",
      "camera": "Static camera.",
      "robot_motion": "None.",
      "object_motion": "The person's hand moves to interact with the black rectangular object."
    },
    {
      "segment_index": 1,
      "time_range": "2-5s",
      "description": "The robotic arm moves towards the black rectangular object.",
      "key_changes": "The robotic arm moves towards the black rectangular object.",
      "camera": "Static camera.",
      "robot_motion": "The robotic arm moves towards the black rectangular object.",
      "object_motion": "None."
    },
    {
      "segment_index": 2,
      "time_range": "5-8s",
      "description": "The robotic arm picks up the black rectangular object.",
      "key_changes": "The robotic arm picks up the black rectangular object.",
      "camera": "Static camera.",
      "robot_motion": "The robotic arm picks up the black rectangular object.",
      "object_motion": "The black rectangular object is picked up by the robotic arm."
    },
    {
      "segment_index": 3,
      "time_range": "8-10s",
      "description": "The person wearing blue gloves places the black rectangular object back on the left side of the table.",
      "key_changes": "The black rectangular object is placed back on the left side of the table.",
      "camera": "Static camera.",
      "robot_motion": "None.",
      "object_motion": "The black rectangular object is placed back on the left side of the table."
    },
    {
      "segment_index": 4,
      "time_range": "10-12s",
      "description": "The robotic arm picks up the black rectangular object again and moves it to the right side of the table.",
      "key_changes": "The robotic arm picks up the black rectangular object and moves it to the right side of the table.",
      "camera": "Static camera.",
      "robot_motion": "The robotic arm picks up the black rectangular object and moves it to the right side of the table.",
      "object_motion": "The black rectangular object is moved to the right side of the table."
    }
  ],
  "transitions": [
    "The video starts with a top-down view of the table with various objects.",
    "The person wearing blue gloves is seen interacting with the black rectangular object.",
    "The robotic arm moves towards the black rectangular object.",
    "The robotic arm picks up the black rectangular object.",
    "The person wearing blue gloves places the black rectangular object back on the left side of the table.",
    "The robotic arm picks up the black rectangular object again and moves it to the right side of the table."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16,9",
  "duration": "12.43s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "A robotic arm with a black and white gripper.",
    "end_effector": "A black and white gripper.",
    "manipulated_objects": [
      "Black rectangular object."
    ],
    "task_goal": "The robot arm is moving the black rectangular object from one side of the table to the other.",
    "initial_state": "The black rectangular object is on the left side of the table.",
    "terminal_state": "The black rectangular object is on the right side of the table.",
    "affordances": [
      "The robotic arm can grasp and move objects.",
      "The robotic arm can pick up and place objects."
    ],
    "failure_modes": [
      "The robotic arm may drop the black rectangular object.",
      "The robotic arm may not be able to grasp the black rectangular object."
    ]
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robotic arm's gripper grasps the black rectangular object.",
    "support_and_stability": "The black rectangular object is supported by the table.",
    "force_gravity_friction": "The robotic arm applies force to move the black rectangular object.",
    "causal_chain": "The robotic arm moves towards the black rectangular object, grasps it, and moves it to the right side of the table.",
    "object_permanence": "The black rectangular object remains on the table throughout the video.",
    "next_action_prediction": "The robotic arm will continue to move the black rectangular object to the right side of the table."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A top-down view of a white table with various objects arranged on it. On the left side, there is a blue mat with a black rectangular object, a black cloth, and a white brush. A person wearing blue gloves is interacting with the black rectangular object. On the right side, there are several colorful objects including a Rubik's cube, a blue ball, a red container, and some electronic components. A robotic arm with a black and white gripper is positioned on the left side of the table, moving towards the black rectangular object. The robotic arm then picks up the black rectangular object and moves it to the right side of the table. The person wearing blue gloves then places the black rectangular object back on the left side of the table. The robotic arm then picks up the black rectangular object again and moves it to the right side of the table. The video ends with the black rectangular object being placed on the right side of the table."
}

Cosmos 3 Super (64B) / p09: Reproduction schema

ok
p09: Reproduction schema · Cosmos 3 Super (64B) · 22.1s · 2026-06-10T16:10:27+0000

Short Read

A robotic arm equipped with a black and white gripper manipulates a black rectangular object on a gray mat, while a person wearing blue gloves observes and assists from above.

Training Caption Field

A top-down view of a workspace where a robotic arm interacts with a black rectangular object on a gray mat. The robotic arm, with a black and white gripper, moves towards the object, grasps it, and then retracts slightly. A person wearing blue gloves is visible at the top of the frame, observing the action. Various objects, including a purple bowl, a white box, a red container with a blue sphere, and a Rubik's cube, are arranged around the workspace. The scene is well-lit and stable, with no significant changes in the arrangement of objects.

Full output JSON/text
{
  "schema_version": "p09_reproduction_v1",
  "one_sentence_summary": "A robotic arm equipped with a black and white gripper manipulates a black rectangular object on a gray mat, while a person wearing blue gloves observes and assists from above.",
  "video_metadata": {
    "resolution": {
      "W": 1152,
      "H": 816
    },
    "aspect_ratio": "16,9",
    "duration_sec": 12.433,
    "fps": 30.0,
    "frame_count": 373
  },
  "camera": {
    "viewpoint": "top-down",
    "framing": "medium shot",
    "camera_angle": "bird's-eye",
    "camera_motion": "static",
    "estimated_height_or_distance": "1.5 meters",
    "lens_or_distortion": "none",
    "crop_boundaries": "none",
    "stability": "stable"
  },
  "scene_layout": {
    "coordinate_frame": "2D image plane",
    "background": "white table surface with various objects arranged around the central workspace",
    "surface": "white table with a gray mat in the center",
    "lighting": "even indoor lighting",
    "global_spatial_map": "central gray mat with a black rectangular object, surrounded by various objects including a purple bowl, white box, red container, blue sphere, Rubik's cube, and other miscellaneous items",
    "occlusions": "none"
  },
  "subjects": [
    {
      "id": "robotic_arm",
      "category": "robot",
      "appearance": "black and white mechanical arm with a gripper",
      "visible_parts": "arm, gripper",
      "initial_pose": "arm extended towards the black rectangular object",
      "final_pose": "arm retracted slightly",
      "workspace_location": "left side of the frame",
      "motion_path": "moves towards the black rectangular object, then retracts",
      "end_effector": "gripper",
      "uncertainties": "none"
    },
    {
      "id": "person",
      "category": "human",
      "appearance": "wearing blue gloves and a white shirt",
      "visible_parts": "hands, arms",
      "initial_pose": "hands resting on the table",
      "final_pose": "hands resting on the table",
      "workspace_location": "top center of the frame",
      "motion_path": "hands move slightly",
      "end_effector": "hands",
      "uncertainties": "none"
    }
  ],
  "objects": [
    {
      "id": "black_rectangular_object",
      "category": "object",
      "appearance": "black, rectangular, with a textured surface",
      "label_or_text": "none",
      "material": "plastic or metal",
      "shape": "rectangular",
      "size_relative_to_frame": "medium",
      "initial_pose": "resting on the gray mat",
      "final_pose": "resting on the gray mat",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "gray_mat",
      "category": "object",
      "appearance": "gray, textured surface",
      "label_or_text": "none",
      "material": "fabric or foam",
      "shape": "rectangular",
      "size_relative_to_frame": "large",
      "initial_pose": "resting on the white table",
      "final_pose": "resting on the white table",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "purple_bowl",
      "category": "object",
      "appearance": "purple, circular, containing small white objects",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "circular",
      "size_relative_to_frame": "small",
      "initial_pose": "resting on the gray mat",
      "final_pose": "resting on the gray mat",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "white_box",
      "category": "object",
      "appearance": "white, rectangular, with orange accents",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "rectangular",
      "size_relative_to_frame": "small",
      "initial_pose": "resting on the gray mat",
      "final_pose": "resting on the gray mat",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "red_container",
      "category": "object",
      "appearance": "red, rectangular, containing a blue sphere and other objects",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "rectangular",
      "size_relative_to_frame": "medium",
      "initial_pose": "resting on the white table",
      "final_pose": "resting on the white table",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "blue_sphere",
      "category": "object",
      "appearance": "blue, spherical",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "spherical",
      "size_relative_to_frame": "small",
      "initial_pose": "resting inside the red container",
      "final_pose": "resting inside the red container",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "rubiks_cube",
      "category": "object",
      "appearance": "multicolored, cubic",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "cubic",
      "size_relative_to_frame": "small",
      "initial_pose": "resting on the white table",
      "final_pose": "resting on the white table",
      "state_changes": "none",
      "uncertainties": "none"
    }
  ],
  "initial_state": "The robotic arm is positioned over the black rectangular object on the gray mat. The person's hands are resting on the table. Various objects are arranged around the workspace.",
  "final_state": "The robotic arm has retracted slightly. The person's hands are still resting on the table. The objects remain in their initial positions.",
  "timeline": [
    {
      "time_range": "0-4s",
      "description": "The robotic arm moves towards the black rectangular object.",
      "active_subjects": [
        "robotic_arm"
      ],
      "object_states": {
        "black_rectangular_object": "resting on the gray mat"
      },
      "robot_motion": "moving towards the black rectangular object",
      "object_motion": "none",
      "gripper_state": "open",
      "contact_state": "none",
      "spatial_changes": "none",
      "physical_causality": "none"
    },
    {
      "time_range": "4-8s",
      "description": "The robotic arm grasps the black rectangular object.",
      "active_subjects": [
        "robotic_arm"
      ],
      "object_states": {
        "black_rectangular_object": "being grasped by the robotic arm"
      },
      "robot_motion": "grasping the black rectangular object",
      "object_motion": "none",
      "gripper_state": "closed",
      "contact_state": "contact with the black rectangular object",
      "spatial_changes": "none",
      "physical_causality": "none"
    },
    {
      "time_range": "8-12s",
      "description": "The robotic arm retracts slightly.",
      "active_subjects": [
        "robotic_arm"
      ],
      "object_states": {
        "black_rectangular_object": "resting on the gray mat"
      },
      "robot_motion": "retracting",
      "object_motion": "none",
      "gripper_state": "open",
      "contact_state": "none",
      "spatial_changes": "none",
      "physical_causality": "none"
    }
  ],
  "parallel_action_graph": {
    "simultaneous_actions": [
      {
        "action": "robotic arm moving towards the black rectangular object",
        "subject": "robotic_arm"
      }
    ],
    "dependencies": [],
    "ordering_constraints": []
  },
  "keyframes": [
    {
      "time": 0.0,
      "frame_description": "The robotic arm is positioned over the black rectangular object on the gray mat. The person's hands are resting on the table. Various objects are arranged around the workspace.",
      "subject_poses": {
        "robotic_arm": "arm extended towards the black rectangular object",
        "person": "hands resting on the table"
      },
      "object_poses": {
        "black_rectangular_object": "resting on the gray mat",
        "gray_mat": "resting on the white table",
        "purple_bowl": "resting on the gray mat",
        "white_box": "resting on the gray mat",
        "red_container": "resting on the white table",
        "blue_sphere": "resting inside the red container",
        "rubiks_cube": "resting on the white table"
      },
      "contacts": [],
      "important_visual_details": []
    },
    {
      "time": 4.0,
      "frame_description": "The robotic arm grasps the black rectangular object.",
      "subject_poses": {
        "robotic_arm": "arm grasping the black rectangular object",
        "person": "hands resting on the table"
      },
      "object_poses": {
        "black_rectangular_object": "being grasped by the robotic arm",
        "gray_mat": "resting on the white table",
        "purple_bowl": "resting on the gray mat",
        "white_box": "resting on the gray mat",
        "red_container": "resting on the white table",
        "blue_sphere": "resting inside the red container",
        "rubiks_cube": "resting on the white table"
      },
      "contacts": [
        {
          "subject_id": "robotic_arm",
          "object_id": "black_rectangular_object",
          "contact_type": "grasping"
        }
      ],
      "important_visual_details": []
    },
    {
      "time": 8.0,
      "frame_description": "The robotic arm retracts slightly.",
      "subject_poses": {
        "robotic_arm": "arm retracted slightly",
        "person": "hands resting on the table"
      },
      "object_poses": {
        "black_rectangular_object": "resting on the gray mat",
        "gray_mat": "resting on the white table",
        "purple_bowl": "resting on the gray mat",
        "white_box": "resting on the gray mat",
        "red_container": "resting on the white table",
        "blue_sphere": "resting inside the red container",
        "rubiks_cube": "resting on the white table"
      },
      "contacts": [],
      "important_visual_details": []
    }
  ],
  "contact_events": [
    {
      "time": 4.0,
      "subject_id": "robotic_arm",
      "object_id": "black_rectangular_object",
      "contact_type": "grasping",
      "contact_location": "black rectangular object",
      "grasp_or_support_quality": "firm",
      "release_or_transfer": "none",
      "visible_evidence": "robotic arm grasps the black rectangular object"
    }
  ],
  "physics_and_affordances": {
    "support_surfaces": [
      "gray mat",
      "white table"
    ],
    "gravity_and_stability": "stable",
    "force_or_constraint_reasoning": "none",
    "collision_or_near_collision": "none",
    "object_permanence": "none",
    "task_goal": "manipulating the black rectangular object",
    "failure_modes": "none",
    "next_action_prediction": "none"
  },
  "appearance_fidelity": {
    "colors": [
      "black",
      "white",
      "gray",
      "purple",
      "red",
      "blue",
      "multicolored"
    ],
    "textures": [
      "textured",
      "smooth",
      "matte",
      "glossy"
    ],
    "lighting_shadows": "even lighting, soft shadows",
    "robot_morphology": "black and white mechanical arm with a gripper",
    "object_fine_details": [
      "textured surface of the black rectangular object",
      "multicolored faces of the Rubik's cube",
      "blue sphere inside the red container"
    ],
    "background_fine_details": [
      "various objects arranged around the workspace"
    ]
  },
  "text_and_labels": [],
  "audio_description": "none",
  "generation_controls": {
    "shot_type": "medium shot",
    "camera_motion_instruction": "static",
    "motion_style": "none",
    "duration_instruction": "12.433s",
    "spatial_consistency_requirements": "none",
    "negative_constraints": "none",
    "recommended_caption_style": "none"
  },
  "uncertainties": [],
  "comprehensive_reproduction_caption": "A top-down view of a workspace where a robotic arm interacts with a black rectangular object on a gray mat. The robotic arm, with a black and white gripper, moves towards the object, grasps it, and then retracts slightly. A person wearing blue gloves is visible at the top of the frame, observing the action. Various objects, including a purple bowl, a white box, a red container with a blue sphere, and a Rubik's cube, are arranged around the workspace. The scene is well-lit and stable, with no significant changes in the arrangement of objects."
}

rovid_002

top

Qwen 3.5 9B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 9B · 77.6s · 2026-06-10T17:43:32+0000

Short Read

At the beginning, the robot arm holds a small patterned object above a grey cup. In the middle, the robot lowers the object into the cup. By the end, the object is fully inside the cup, and the robot arm retracts slightly.

Training Caption Field

A white robotic arm with black joints and a green indicator light at its base is positioned over a wooden table covered with a grey textured cloth. The robot holds a small rectangular object with a red and white pattern in its black gripper. A grey cylindrical cup sits on the table. The robot lowers the object into the cup, placing it inside, and then retracts its arm. The background features a dark brown textured wall on the left and a greenish-grey textured wall on the right, under dim ambient lighting.

Full output JSON/text
{
  "scene_imagination": "A white, multi-jointed robotic arm with a black gripper is positioned over a wooden table covered with a grey textured cloth. The robot holds a small, rectangular object with a red and white pattern. A grey cylindrical cup sits on the table. The background features a dark brown textured wall on the left and a greenish-grey textured wall on the right. The lighting is dim and ambient.",
  "temporal_caption": "At the beginning, the robot arm holds a small patterned object above a grey cup. In the middle, the robot lowers the object into the cup. By the end, the object is fully inside the cup, and the robot arm retracts slightly.",
  "audio_description": null,
  "subjects": [
    {
      "description": "White robotic arm with black joints and a gripper",
      "appearance_details": "Smooth white segments, black circular joints, green indicator light at the base, black gripper mechanism",
      "relationship": "Manipulating the small object and interacting with the cup",
      "location": "Center-left of the frame, extending over the table",
      "relative_size": "Large compared to the object, medium compared to the cup",
      "orientation": "Angled downwards towards the cup",
      "pose": "Extended and articulated",
      "action": "Holding, lowering, and releasing an object",
      "state_changes": "Moves from a raised position to a lowered position inside the cup"
    },
    {
      "description": "Small rectangular object",
      "appearance_details": "White with red and black markings",
      "relationship": "Being held by the robot arm",
      "location": "Initially in the gripper, then inside the cup",
      "relative_size": "Small, fits within the gripper",
      "orientation": "Vertical",
      "pose": "Held upright",
      "action": "Being placed into the cup",
      "state_changes": "Moves from outside the cup to inside the cup"
    },
    {
      "description": "Grey cylindrical cup",
      "appearance_details": "Matte grey, open top",
      "relationship": "Receiving the object from the robot",
      "location": "On the table, to the right of the robot arm",
      "relative_size": "Medium, larger than the object",
      "orientation": "Vertical",
      "pose": "Stationary",
      "action": "Receiving the object",
      "state_changes": "Object moves from outside to inside"
    }
  ],
  "background_setting": "An indoor environment with a wooden table surface covered by a grey textured cloth. The walls have a distinct texture; the left wall is dark brown with a grid-like pattern, and the right wall is greenish-grey with a rougher texture.",
  "lighting": {
    "conditions": "Dim, low-light environment",
    "direction": "Ambient, seemingly from above and slightly to the right",
    "shadows": "Soft shadows cast by the robot arm and cup onto the table",
    "illumination_effect": "Creates a moody, slightly dark atmosphere"
  },
  "aesthetics": {
    "composition": "Medium shot focusing on the robot arm and the interaction with the cup",
    "color_scheme": "Muted tones: white, grey, brown, and dark green",
    "mood_atmosphere": "Clinical, focused, slightly mysterious due to low lighting",
    "patterns": "Textured walls and tablecloth, grid pattern on the left wall"
  },
  "cinematography": {
    "framing": "Medium close-up",
    "camera_angle": "High angle, looking down at the table",
    "depth_of_field": "Shallow, focusing on the robot and cup",
    "focus": "Sharp on the robot arm and cup",
    "lens_focal_length": "Unknown",
    "camera_motion": "Static"
  },
  "style_medium": "robotics video",
  "artistic_style": "synthetic robot simulation",
  "context": "A robot is performing a simple manipulation task, likely a demonstration of object placement or grasping capability.",
  "actions": [
    {
      "time": "beginning",
      "description": "The robot arm holds a small patterned object above a grey cup.",
      "actor": "robotic arm",
      "object": "small patterned object",
      "contact_event": "gripper holding object",
      "state_change": "none",
      "evidence": "object visible in gripper"
    },
    {
      "time": "middle",
      "description": "The robot arm lowers the object towards the cup.",
      "actor": "robotic arm",
      "object": "small patterned object",
      "contact_event": "object approaching cup opening",
      "state_change": "object moving downwards",
      "evidence": "arm extending downwards"
    },
    {
      "time": "end",
      "description": "The robot arm places the object inside the cup and retracts.",
      "actor": "robotic arm",
      "object": "small patterned object",
      "contact_event": "object inside cup",
      "state_change": "object now inside cup",
      "evidence": "object no longer visible in gripper"
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "0.0s - 2.0s",
      "description": "The robot arm holds the object steady above the cup.",
      "key_changes": "none",
      "camera": "static",
      "robot_motion": "minimal adjustment",
      "object_motion": "none",
      "physical_logic": "preparation for placement"
    },
    {
      "segment_index": 2,
      "time_range": "2.0s - 5.0s",
      "description": "The robot arm lowers the object into the cup.",
      "key_changes": "object enters cup",
      "camera": "static",
      "robot_motion": "downward translation",
      "object_motion": "downward translation",
      "physical_logic": "gravity-assisted placement"
    },
    {
      "segment_index": 3,
      "time_range": "5.0s - 9.767s",
      "description": "The robot arm retracts after placing the object.",
      "key_changes": "arm moves away",
      "camera": "static",
      "robot_motion": "retraction",
      "object_motion": "none",
      "physical_logic": "task completion"
    }
  ],
  "transitions": [
    "object moves from gripper to cup",
    "arm retracts after placement"
  ],
  "resolution": {
    "W": 1360,
    "H": 688
  },
  "aspect_ratio": "16:9",
  "duration": "9.767s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "white multi-jointed arm with black joints",
    "end_effector": "black gripper",
    "manipulated_objects": [
      "small patterned object",
      "grey cup"
    ],
    "task_goal": "place object into cup",
    "initial_state": "object held above cup",
    "terminal_state": "object inside cup, arm retracted",
    "affordances": "gripper for holding, joints for positioning",
    "failure_modes": "none observed"
  },
  "physical_reasoning": {
    "contact_and_grasp": "gripper securely holds object before release",
    "support_and_stability": "cup provides stable base for object",
    "force_gravity_friction": "gravity assists object into cup, friction holds it inside",
    "causal_chain": "arm moves down -> object enters cup -> arm retracts",
    "object_permanence": "object remains inside cup",
    "next_action_prediction": "robot may retrieve object or move to next task"
  },
  "uncertainties": [
    "exact contents of the cup before object insertion",
    "specific material properties of the object"
  ],
  "comprehensive_t2v_caption": "A white robotic arm with black joints and a green indicator light at its base is positioned over a wooden table covered with a grey textured cloth. The robot holds a small rectangular object with a red and white pattern in its black gripper. A grey cylindrical cup sits on the table. The robot lowers the object into the cup, placing it inside, and then retracts its arm. The background features a dark brown textured wall on the left and a greenish-grey textured wall on the right, under dim ambient lighting."
}

Qwen 3.5 27B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 27B · 176.2s · 2026-06-10T18:02:05+0000

Short Read

At the beginning, the robotic arm is positioned with its gripper open above the grey cup. In the early phase, the arm moves its end-effector directly over the opening of the cup. By the middle of the sequence, the gripper closes around an unseen object inside the cup. In the late phase, the arm lifts the gripper, now holding a small, dark object, out of the cup. At the end, the arm holds the object steady above the cup.

Training Caption Field

A white, multi-jointed robotic arm, mounted on a dark surface, extends over a wooden table covered with a blue textured mat. A grey cylindrical cup sits on the mat. The arm, with a green indicator light at its base, precisely maneuvers its two-fingered gripper towards the cup. The gripper descends into the cup, closes around a small, dark object, and then lifts the object out, holding it steadily above the cup. The scene is dimly lit, highlighting the mechanical precision of the robotic action against a backdrop of textured walls.

Full output JSON/text
{
  "scene_imagination": "A white, multi-jointed robotic arm is mounted on a dark brown surface, positioned next to a wooden table covered with a blue textured mat. On the mat sits a grey cylindrical cup. The background features a dark, textured wall on the left and a green, patterned surface on the right. The lighting is dim and focused, creating a moody atmosphere. The camera is static, providing a high-angle side view of the setup.",
  "temporal_caption": "At the beginning, the robotic arm is positioned with its gripper open above the grey cup. In the early phase, the arm moves its end-effector directly over the opening of the cup. By the middle of the sequence, the gripper closes around an unseen object inside the cup. In the late phase, the arm lifts the gripper, now holding a small, dark object, out of the cup. At the end, the arm holds the object steady above the cup.",
  "audio_description": null,
  "subjects": [
    {
      "description": "A white, articulated robotic arm with black joints and a green indicator light at its base.",
      "appearance_details": "Smooth, white plastic-like segments connected by spherical black joints. The end-effector is a two-fingered gripper.",
      "relationship": "The active agent manipulating the object.",
      "location": "Mounted on the left side of the frame, extending over the table.",
      "relative_size": "Large, dominating the left and center of the frame.",
      "orientation": "Facing right towards the cup.",
      "pose": "Articulated in a curved shape to reach the cup.",
      "action": "Reaching into the cup, grasping an object, and lifting it.",
      "state_changes": "Transitions from an open gripper state to a closed gripper state holding an object."
    },
    {
      "description": "A grey, cylindrical cup.",
      "appearance_details": "Simple, matte grey finish with a slightly tapered shape.",
      "relationship": "The container from which the object is retrieved.",
      "location": "Centered on the blue mat on the wooden table.",
      "relative_size": "Medium-sized relative to the robot arm.",
      "orientation": "Upright.",
      "pose": "Stationary.",
      "action": "Being interacted with by the robot.",
      "state_changes": "Its contents are altered as an object is removed."
    },
    {
      "description": "A small, dark object.",
      "appearance_details": "Indistinct shape, dark color, possibly a tool or small part.",
      "relationship": "The object being manipulated.",
      "location": "Initially inside the cup, then held by the gripper.",
      "relative_size": "Small, fitting within the gripper.",
      "orientation": "Varies as it is grasped.",
      "pose": "Held by the gripper.",
      "action": "Being picked up.",
      "state_changes": "Moves from inside the cup to being held in the air."
    }
  ],
  "background_setting": "An indoor setting with a dark, textured wall on the left and a green, possibly fabric or painted, wall on the right. A wooden table with a blue mat serves as the workspace.",
  "lighting": {
    "conditions": "Dim, artificial lighting.",
    "direction": "Appears to come from the front-left, casting soft shadows.",
    "shadows": "Soft shadows are cast by the robot arm onto the table and mat.",
    "illumination_effect": "Creates a focused, slightly dramatic look on the robot and the immediate workspace."
  },
  "aesthetics": {
    "composition": "The robot arm enters from the left, creating a diagonal line towards the cup in the center-right. The composition is balanced but dynamic due to the arm's pose.",
    "color_scheme": "Muted tones of white, grey, brown, and blue, with a touch of green from the indicator light and background.",
    "mood_atmosphere": "Technical, precise, and slightly mysterious due to the dim lighting.",
    "patterns": "The texture on the blue mat and the background walls provide subtle patterns."
  },
  "cinematography": {
    "framing": "Medium shot, capturing the robot arm, the cup, and part of the surrounding environment.",
    "camera_angle": "High-angle, looking down at the workspace.",
    "depth_of_field": "Relatively deep, keeping both the robot and the background reasonably in focus.",
    "focus": "Sharp focus on the robot arm and the cup.",
    "lens_focal_length": "Unknown, likely a standard lens.",
    "camera_motion": "Static camera."
  },
  "style_medium": "robotics video",
  "artistic_style": "synthetic robot simulation",
  "context": "The video depicts a simulated robotic manipulation task, likely for training or demonstration purposes. The inferred goal is to pick up an object from inside a cup.",
  "actions": [
    {
      "time": "00:00 - 00:02",
      "description": "The robotic arm positions its open gripper over the cup.",
      "actor": "Robotic arm",
      "object": "Cup",
      "contact_event": "None",
      "state_change": "Gripper moves from a resting position to a target position.",
      "evidence": "Visual movement of the arm segments."
    },
    {
      "time": "00:02 - 00:05",
      "description": "The gripper descends into the cup and closes around an object.",
      "actor": "Robotic arm",
      "object": "Object inside cup",
      "contact_event": "Gripper contacts and grasps the object.",
      "state_change": "Gripper closes; object is secured.",
      "evidence": "Gripper fingers move together inside the cup."
    },
    {
      "time": "00:05 - 00:09",
      "description": "The robotic arm lifts the gripper, holding the object, out of the cup.",
      "actor": "Robotic arm",
      "object": "Object",
      "contact_event": "Object is lifted from the cup.",
      "state_change": "Object moves from inside the cup to being held above it.",
      "evidence": "Arm segments articulate to raise the gripper."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:02",
      "description": "Robot arm approaches the cup.",
      "key_changes": "Arm moves into position.",
      "camera": "Static",
      "robot_motion": "Approach",
      "object_motion": "None",
      "physical_logic": "Planning and execution of a reach trajectory."
    },
    {
      "segment_index": 2,
      "time_range": "00:02 - 00:05",
      "description": "Robot arm grasps object inside the cup.",
      "key_changes": "Gripper closes.",
      "camera": "Static",
      "robot_motion": "Grasp",
      "object_motion": "None (initially)",
      "physical_logic": "Contact and force application for grasping."
    },
    {
      "segment_index": 3,
      "time_range": "00:05 - 00:09",
      "description": "Robot arm lifts the object from the cup.",
      "key_changes": "Object is lifted.",
      "camera": "Static",
      "robot_motion": "Lift",
      "object_motion": "Upward movement",
      "physical_logic": "Overcoming gravity to lift the object."
    }
  ],
  "transitions": [
    "Approach to grasp",
    "Grasp to lift"
  ],
  "resolution": {
    "W": 1360,
    "H": 688
  },
  "aspect_ratio": "19:9.6",
  "duration": "9.767s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Articulated robotic arm with multiple degrees of freedom.",
    "end_effector": "Two-fingered parallel gripper.",
    "manipulated_objects": "Small, dark object (unspecified).",
    "task_goal": "Pick up an object from inside a cup.",
    "initial_state": "Arm positioned away from cup, gripper open.",
    "terminal_state": "Arm holding object above cup, gripper closed.",
    "affordances": "Grasping, lifting, precise positioning.",
    "failure_modes": "Potential for dropping the object, missing the grasp, or colliding with the cup."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper makes contact with the object inside the cup and applies sufficient force to secure it.",
    "support_and_stability": "The cup remains stable on the mat throughout the interaction.",
    "force_gravity_friction": "The robot applies upward force to overcome gravity and lift the object. Friction between the gripper and the object prevents slipping.",
    "causal_chain": "Arm movement -> Gripper closure -> Object grasp -> Arm lift -> Object lift.",
    "object_permanence": "The object exists inside the cup before being grasped and continues to exist after being lifted.",
    "next_action_prediction": "The robot might move the object to another location or place it back in the cup."
  },
  "uncertainties": [
    "The exact nature of the object being picked up is unclear due to its small size and the angle.",
    "The specific type of robotic arm and its control system are not identifiable."
  ],
  "comprehensive_t2v_caption": "A white, multi-jointed robotic arm, mounted on a dark surface, extends over a wooden table covered with a blue textured mat. A grey cylindrical cup sits on the mat. The arm, with a green indicator light at its base, precisely maneuvers its two-fingered gripper towards the cup. The gripper descends into the cup, closes around a small, dark object, and then lifts the object out, holding it steadily above the cup. The scene is dimly lit, highlighting the mechanical precision of the robotic action against a backdrop of textured walls."
}

Cosmos 3 Nano (16B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Nano (16B) · 13.2s · 2026-06-10T18:35:50+0000

Short Read

The robotic arm begins in a raised position, then moves downward and forward toward the gray cylindrical container on the table. It hovers above the container, adjusts its position slightly, and then appears to interact with or place something inside the container. The arm remains in this position for the remainder of the video.

Training Caption Field

A robotic arm with a white and black end-effector, featuring a green indicator light at its base, is positioned over a wooden table covered with a blue cloth. The arm begins in a raised position and moves downward and forward toward a gray cylindrical container on the table. The background consists of dark textured walls on the left and a greenish patterned surface on the right. The lighting is dim, creating soft shadows. The robotic arm appears to interact with the container, suggesting a manipulation task. The scene is a 3D animation or simulation, with a clinical and focused atmosphere.

Full output JSON/text
{
  "scene_imagination": "A robotic arm with a white and black end-effector is positioned over a wooden table covered with a blue cloth. The arm has a green indicator light at its base. A gray cylindrical container sits on the table, and a small white circular object is visible on the left side of the table. The background consists of dark textured walls and a greenish patterned surface on the right. The lighting is dim, creating soft shadows.",
  "temporal_caption": "The robotic arm begins in a raised position, then moves downward and forward toward the gray cylindrical container on the table. It hovers above the container, adjusts its position slightly, and then appears to interact with or place something inside the container. The arm remains in this position for the remainder of the video.",
  "audio_description": "The video is silent, with no discernible audio elements.",
  "subjects": [
    {
      "description": "A robotic arm with a white and black end-effector, featuring a green indicator light at its base.",
      "appearance_details": "The arm has multiple joints and segments, with a white body and black accents. The end-effector has a small orange component and a black gripping mechanism. A green light is illuminated at the base of the arm.",
      "relationship": "Interacting with the gray cylindrical container on the table.",
      "location": "Center to left side of the frame.",
      "relative_size": "Large within the frame.",
      "orientation": "Facing the table and the container.",
      "pose": "Initially raised, then moving downward and forward.",
      "action": "Moving toward and interacting with the gray container.",
      "state_changes": "The arm transitions from a raised position to a lowered position near the container."
    },
    {
      "description": "A gray cylindrical container with an open top.",
      "appearance_details": "The container is cylindrical, made of a matte gray material, and has an open top.",
      "relationship": "Being interacted with by the robotic arm.",
      "location": "Right side of the table.",
      "relative_size": "Medium within the frame.",
      "orientation": "Upright on the table.",
      "pose": "Stationary.",
      "action": "Remains stationary while the robotic arm interacts with it.",
      "state_changes": "No significant change."
    },
    {
      "description": "A small white circular object on the table.",
      "appearance_details": "The object is flat, circular, and white.",
      "relationship": "Stationary on the table, unrelated to the robotic arm's action.",
      "location": "Left side of the table.",
      "relative_size": "Small within the frame.",
      "orientation": "Flat on the table.",
      "pose": "Stationary.",
      "action": "Remains stationary.",
      "state_changes": "No significant change."
    }
  ],
  "background_setting": "The scene takes place indoors, with a wooden table covered by a blue cloth. The background consists of dark textured walls on the left and a greenish patterned surface on the right. The lighting is dim, creating a subdued atmosphere.",
  "lighting": {
    "conditions": "Dim indoor lighting.",
    "direction": "Top-lit, casting soft shadows.",
    "shadows": "Soft shadows on the table and walls.",
    "illumination_effect": "The green indicator light on the robotic arm is the most prominent source of illumination in the scene."
  },
  "aesthetics": {
    "composition": "Centered on the robotic arm and the container, with the table and background elements framing the scene.",
    "color_scheme": "Muted tones, with white, black, gray, and blue dominating the scene.",
    "mood_atmosphere": "Clinical and focused, typical of a robotic or laboratory setting.",
    "patterns": "The blue cloth on the table has a subtle textured pattern."
  },
  "cinematography": {
    "camera_angle": "High angle, looking down at the table.",
    "framing": "Medium shot, capturing the robotic arm, the table, and the background.",
    "depth_of_field": "Deep, with both the foreground and background in focus.",
    "focus": "Sharp focus on the robotic arm and the container.",
    "lens_focal_length": "Standard.",
    "camera_motion": "Static."
  },
  "style_medium": "3D animation or simulation.",
  "artistic_style": "Realistic 3D rendering.",
  "context": "A robotic manipulation task, likely a simulation or demonstration of a robotic arm interacting with objects on a table.",
  "actions": [
    {
      "time": "0:00-0:03",
      "description": "The robotic arm is raised, with its end-effector positioned above the table.",
      "actor": "Robotic arm.",
      "object": "End-effector.",
      "contact_event": "No contact with the container.",
      "state_change": "The arm begins to move downward."
    },
    {
      "time": "0:03-0:06",
      "description": "The robotic arm moves downward and forward toward the gray cylindrical container.",
      "actor": "Robotic arm.",
      "object": "End-effector.",
      "contact_event": "The end-effector hovers above the container.",
      "state_change": "The arm reaches the container."
    },
    {
      "time": "0:06-0:09",
      "description": "The robotic arm remains in position near the container, appearing to interact with it.",
      "actor": "Robotic arm.",
      "object": "End-effector.",
      "contact_event": "The end-effector is in contact with the container.",
      "state_change": "The arm maintains its position."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0:00-0:03",
      "description": "The robotic arm is raised and begins to move downward.",
      "key_changes": "The arm transitions from a raised position to a lowered position.",
      "camera": "Static high angle shot.",
      "robot_motion": "The arm moves downward and forward.",
      "object_motion": "The end-effector moves toward the container.",
      "physical_reasoning": "The arm's movement is smooth and controlled, suggesting a programmed task."
    }
  ],
  "transitions": [],
  "resolution": {
    "W": 1360,
    "H": 688
  },
  "aspect_ratio": "16,0",
  "duration": "9.767s",
  "fps": 30,
  "robotics": {
    "embodiment": "Manipulation robot",
    "end_effector": "Robotic arm with a white and black end-effector.",
    "manipulated_objects": "Gray cylindrical container.",
    "task_goal": "Interact with the gray cylindrical container.",
    "initial_state": "The robotic arm is raised, with its end-effector positioned above the table.",
    "terminal_state": "The robotic arm is in position near the container, appearing to interact with it.",
    "affordances": "The container is open at the top, allowing the robotic arm to interact with it.",
    "failure_modes": "No failure modes are evident in the video."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robotic arm moves toward the container and appears to interact with it, suggesting a grasp or placement action.",
    "support_and_stability": "The container is stable on the table, and the robotic arm's movement is smooth and controlled.",
    "force_gravity_friction": "The objects are subject to gravity, and the robotic arm's movement suggests a controlled application of force.",
    "causal_chain": "The robotic arm moves downward and forward, reaching the container and interacting with it.",
    "object_permutation": "The objects remain in their initial positions, with the robotic arm's movement being the only change.",
    "next_action_prediction": "The robotic arm may continue to interact with the container or move to another object."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A robotic arm with a white and black end-effector, featuring a green indicator light at its base, is positioned over a wooden table covered with a blue cloth. The arm begins in a raised position and moves downward and forward toward a gray cylindrical container on the table. The background consists of dark textured walls on the left and a greenish patterned surface on the right. The lighting is dim, creating soft shadows. The robotic arm appears to interact with the container, suggesting a manipulation task. The scene is a 3D animation or simulation, with a clinical and focused atmosphere."
}

Cosmos 3 Super (64B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Super (64B) · 18.3s · 2026-06-10T16:10:46+0000

Short Read

The robotic arm moves downward, positioning the object it is holding above the gray cup. It then releases the object into the cup, and the arm remains stationary for the remainder of the video.

Training Caption Field

A white robotic arm with a green indicator light on its base moves downward, holding a small, dark object. It positions the object above a gray cylindrical cup on a wooden table with a textured gray mat. The robot then releases the object into the cup, and the arm remains stationary for the remainder of the video. The background features a dark, patterned wall on the left and a green, textured wall on the right. The lighting is dim, creating a moody atmosphere.

Full output JSON/text
{
  "scene_imagination": "A white robotic arm with a gripper is positioned over a wooden table with a textured gray surface. The robot is holding a small, dark object, possibly a utensil or tool, and is moving it toward a gray cylindrical cup. The background features a dark, patterned wall on the left and a green, textured wall on the right. The lighting is dim, creating a moody atmosphere. The robot's joints are articulated, and it has a green indicator light on its base.",
  "temporal_caption": "The robotic arm moves downward, positioning the object it is holding above the gray cup. It then releases the object into the cup, and the arm remains stationary for the remainder of the video.",
  "audio_description": "No audio is present in the video.",
  "subjects": [
    {
      "description": "A white robotic arm with multiple articulated joints and a gripper at the end.",
      "appearance_details": "The arm is white with black joints and a green indicator light on its base. The gripper is open and holding a small, dark object.",
      "relationship": "The robot is interacting with the gray cup on the table.",
      "location": "Center of the frame, extending from the bottom left to the center.",
      "relative_size": "Large within the frame.",
      "orientation": "Facing the gray cup.",
      "pose": "The arm is extended and angled downward toward the cup.",
      "action": "The robot moves downward, positions the object above the cup, and releases it.",
      "state_changes": "The arm moves from a higher position to a lower position, then remains stationary."
    },
    {
      "description": "A gray cylindrical cup.",
      "appearance_details": "The cup is gray with a smooth surface and a slightly flared rim.",
      "relationship": "The cup is the target object for the robot's action.",
      "location": "Center of the frame, on the table.",
      "relative_size": "Medium within the frame.",
      "orientation": "Upright.",
      "pose": "Stationary on the table.",
      "action": "The cup receives the object dropped by the robot.",
      "state_changes": "No significant change."
    },
    {
      "description": "A small, dark object held by the robot's gripper.",
      "appearance_details": "The object is dark and appears to be a utensil or tool.",
      "relationship": "The object is being manipulated by the robot and dropped into the cup.",
      "location": "Held by the robot's gripper, moving toward the cup.",
      "relative_size": "Small within the frame.",
      "orientation": "Held vertically by the gripper.",
      "pose": "Moving downward.",
      "action": "The object is moved by the robot and released into the cup.",
      "state_changes": "The object moves from the gripper to inside the cup."
    }
  ],
  "background_setting": "The scene takes place indoors, likely in a kitchen or similar setting. The background features a dark, patterned wall on the left and a green, textured wall on the right. The table has a wooden surface with a textured gray mat covering most of it.",
  "lighting": {
    "conditions": "Dim indoor lighting.",
    "direction": "Top-lit, casting shadows downward.",
    "shadows": "Soft shadows are cast by the robot and the cup on the table.",
    "illumination_effect": "The lighting creates a moody atmosphere and highlights the textures of the walls and the table surface."
  },
  "aesthetics": {
    "composition": "The robot and the cup are centrally framed, drawing attention to the interaction between them.",
    "color_scheme": "Muted tones with a contrast between the white robot, the gray cup, and the dark, textured background.",
    "mood_atmosphere": "Quiet and focused, emphasizing the precision of the robot's action.",
    "patterns": "The background walls have distinct patterns, and the table surface has a textured mat."
  },
  "cinematography": {
    "camera_motion": "Static camera throughout the video.",
    "framing": "Medium shot, capturing the robot, the cup, and the immediate surroundings.",
    "camera_angle": "High angle, looking down at the table and the robot's action.",
    "depth_of_field": "Deep depth of field, keeping both the robot and the background in focus.",
    "focus": "Sharp focus on the robot and the cup.",
    "lens_focal_length": "Standard lens."
  },
  "style_medium": "3D animation or simulation.",
  "artistic_style": "Realistic 3D rendering.",
  "context": "A demonstration of a robotic arm performing a simple task, likely for educational or research purposes.",
  "actions": [
    {
      "time": "0-3 seconds",
      "description": "The robotic arm moves downward, positioning the object it is holding above the gray cup.",
      "actor": "Robotic arm",
      "object": "Small, dark object",
      "contact_event": "The object is positioned above the cup.",
      "state_change": "The arm moves from a higher position to a lower position."
    },
    {
      "time": "3-6 seconds",
      "description": "The robot releases the object into the cup.",
      "actor": "Robotic arm",
      "object": "Small, dark object",
      "contact_event": "The object is dropped into the cup.",
      "state_change": "The object moves from the gripper to inside the cup."
    },
    {
      "time": "6-9 seconds",
      "description": "The robotic arm remains stationary after releasing the object.",
      "actor": "Robotic arm",
      "object": "None",
      "contact_event": "No contact.",
      "state_change": "No significant change."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0-3 seconds",
      "description": "The robotic arm moves downward, positioning the object it is holding above the gray cup.",
      "key_changes": "The arm moves from a higher position to a lower position.",
      "camera": "Static high angle shot.",
      "robot_motion": "The arm moves downward.",
      "object_motion": "The object moves downward with the arm.",
      "physical_logic": "The arm moves smoothly and precisely."
    },
    {
      "segment_index": 1,
      "time_range": "3-6 seconds",
      "description": "The robot releases the object into the cup.",
      "key_changes": "The object is dropped into the cup.",
      "camera": "Static high angle shot.",
      "robot_motion": "The arm releases the object.",
      "object_motion": "The object moves from the gripper to inside the cup.",
      "physical_logic": "The object falls into the cup due to gravity."
    },
    {
      "segment_index": 2,
      "time_range": "6-9 seconds",
      "description": "The robotic arm remains stationary after releasing the object.",
      "key_changes": "No significant change.",
      "camera": "Static high angle shot.",
      "robot_motion": "No motion.",
      "object_motion": "No motion.",
      "physical_logic": "The arm remains stationary."
    }
  ],
  "transitions": [],
  "resolution": {
    "W": 1360,
    "H": 688
  },
  "aspect_ratio": "16,9",
  "duration": "9.767s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "A white robotic arm with multiple articulated joints and a gripper at the end.",
    "end_effector": "A gripper holding a small, dark object.",
    "manipulated_objects": [
      "A small, dark object",
      "A gray cylindrical cup"
    ],
    "task_goal": "To drop the object into the cup.",
    "initial_state": "The robot is holding the object above the cup.",
    "terminal_state": "The object is inside the cup, and the robot is stationary.",
    "affordances": "The robot can move its arm and gripper to manipulate objects.",
    "failure_modes": "The object could miss the cup if the robot's movement is imprecise."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robot's gripper holds the object securely until it is released.",
    "support_and_stability": "The robot's base provides stability for its movements.",
    "force_gravity_friction": "Gravity causes the object to fall into the cup after being released.",
    "causal_chain": "The robot moves the object to the cup, releases it, and the object falls into the cup due to gravity.",
    "object_permanence": "The object is visible throughout the video, moving from the gripper to the cup.",
    "next_action_prediction": "The robot will likely move away from the cup or perform another task."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A white robotic arm with a green indicator light on its base moves downward, holding a small, dark object. It positions the object above a gray cylindrical cup on a wooden table with a textured gray mat. The robot then releases the object into the cup, and the arm remains stationary for the remainder of the video. The background features a dark, patterned wall on the left and a green, textured wall on the right. The lighting is dim, creating a moody atmosphere."
}

Cosmos 3 Super (64B) / p09: Reproduction schema

ok
p09: Reproduction schema · Cosmos 3 Super (64B) · 19.7s · 2026-06-10T16:11:05+0000

Short Read

A white robotic arm with a gripper carefully places a small object into a dark cylindrical container on a wooden surface.

Training Caption Field

A white robotic arm with a gripper carefully places a small object into a dark cylindrical container on a wooden surface. The robotic arm is positioned on the left side of the frame, extending towards the center where the container is located. The background consists of a dark, textured wall on the left and a green, patterned wall on the right. The lighting is dim, with a slight green tint. The robotic arm moves downwards and slightly forward to position the small object above the container, then places the object into the container and retracts slightly. The container is stationary on the wooden surface, and the robotic arm has a green light on its base and a blue light on its end effector.

Full output JSON/text
{
  "schema_version": "p09_reproduction_v1",
  "one_sentence_summary": "A white robotic arm with a gripper carefully places a small object into a dark cylindrical container on a wooden surface.",
  "video_metadata": {
    "resolution": {
      "W": 1360,
      "H": 688
    },
    "aspect_ratio": "16,9",
    "duration_sec": 9.767,
    "fps": 30.0,
    "frame_count": 293
  },
  "camera": {
    "viewpoint": "High-angle, slightly side-on view of the workspace.",
    "framing": "Medium shot focusing on the robotic arm and the container.",
    "camera_angle": "High angle",
    "camera_motion": "Static",
    "estimated_height_or_distance": "Approximately 1 meter above the workspace.",
    "lens_or_distortion": "Standard lens, no visible distortion.",
    "crop_boundaries": "Full workspace visible within frame.",
    "stability": "Static, no camera shake or movement."
  },
  "scene_layout": {
    "coordinate_frame": "Right-handed coordinate system with origin near the base of the robotic arm.",
    "background": "A dark, textured wall on the left and a green, patterned wall on the right. The workspace is a wooden surface with a textured mat.",
    "surface": "Wooden surface with a textured mat.",
    "lighting": "Dim, ambient lighting with a slight green tint.",
    "global_spatial_map": "The robotic arm is positioned on the left side of the frame, extending towards the center where the container is located.",
    "occlusions": "The robotic arm partially occludes the view of the container at times."
  },
  "subjects": [
    {
      "id": "robotic_arm",
      "category": "robot",
      "appearance": "White, multi-jointed robotic arm with a gripper at the end. It has a green light on its base and a blue light on its end effector.",
      "visible_parts": "Base, joints, gripper, lights.",
      "initial_pose": "The robotic arm is extended towards the container, holding a small object.",
      "final_pose": "The robotic arm is retracted slightly, with the gripper empty.",
      "workspace_location": "Left side of the frame, extending towards the center.",
      "motion_path": "The robotic arm moves downwards and slightly forward to place the object into the container, then retracts.",
      "end_effector": "A gripper holding a small object.",
      "uncertainties": "The exact nature of the small object being manipulated is unclear."
    }
  ],
  "objects": [
    {
      "id": "container",
      "category": "cylindrical_container",
      "appearance": "Dark, cylindrical container with a flat top.",
      "label_or_text": "None",
      "material": "Appears to be plastic or metal.",
      "shape": "Cylindrical",
      "size_relative_to_frame": "Medium",
      "initial_pose": "Stationary on the wooden surface.",
      "final_pose": "Stationary on the wooden surface, with the small object inside.",
      "state_changes": "The container receives the small object from the robotic arm.",
      "uncertainties": "The exact material of the container is unclear."
    }
  ],
  "initial_state": "The robotic arm is holding a small object above the container.",
  "final_state": "The robotic arm has placed the small object into the container and retracted slightly.",
  "timeline": [
    {
      "time_range": "0.0-3.0s",
      "description": "The robotic arm moves downwards and slightly forward, positioning the small object above the container.",
      "active_subjects": [
        "robotic_arm"
      ],
      "object_states": {
        "container": "Stationary",
        "robotic_arm": "Moving downwards and forward"
      },
      "robot_motion": "The robotic arm moves downwards and slightly forward.",
      "object_motion": "None",
      "gripper_state": "Holding a small object",
      "contact_state": "The gripper is positioned above the container.",
      "spatial_changes": "The robotic arm moves closer to the container."
    },
    {
      "time_range": "3.0-6.0s",
      "description": "The robotic arm places the small object into the container.",
      "active_subjects": [
        "robotic_arm"
      ],
      "object_states": {
        "container": "Receives the small object",
        "robotic_arm": "Moving downwards"
      },
      "robot_motion": "The robotic arm moves downwards to place the object.",
      "object_motion": "The small object is placed into the container.",
      "gripper_state": "Releases the small object",
      "contact_state": "The gripper makes contact with the container.",
      "spatial_changes": "The small object is transferred from the gripper to the container."
    },
    {
      "time_range": "6.0-9.767s",
      "description": "The robotic arm retracts slightly, leaving the small object inside the container.",
      "active_subjects": [
        "robotic_arm"
      ],
      "object_states": {
        "container": "Stationary with the small object inside",
        "robotic_arm": "Retracting"
      },
      "robot_motion": "The robotic arm retracts slightly.",
      "object_motion": "None",
      "gripper_state": "Empty",
      "contact_state": "The gripper is no longer in contact with the container.",
      "spatial_changes": "The robotic arm moves away from the container."
    }
  ],
  "parallel_action_graph": {
    "simultaneous_actions": [],
    "dependencies": [
      {
        "parent": "The robotic arm must move downwards and forward to position the object above the container.",
        "child": "The robotic arm can then place the object into the container."
      },
      {
        "parent": "The robotic arm must place the object into the container.",
        "child": "The robotic arm can then retract."
      }
    ],
    "ordering_constraints": [
      "The robotic arm moves downwards and forward before placing the object.",
      "The robotic arm places the object before retracting."
    ]
  },
  "keyframes": [
    {
      "time": 0.0,
      "frame_description": "The robotic arm is holding a small object above the container.",
      "subject_poses": {
        "robotic_arm": "Extended towards the container, holding a small object."
      },
      "object_poses": {
        "container": "Stationary on the wooden surface."
      },
      "contacts": {
        "robotic_arm": "Holding a small object",
        "container": "None"
      },
      "important_visual_details": "The robotic arm is positioned above the container."
    },
    {
      "time": 3.0,
      "frame_description": "The robotic arm is positioned directly above the container, ready to place the object.",
      "subject_poses": {
        "robotic_arm": "Extended downwards, holding a small object."
      },
      "object_poses": {
        "container": "Stationary on the wooden surface."
      },
      "contacts": {
        "robotic_arm": "Holding a small object",
        "container": "None"
      },
      "important_visual_details": "The robotic arm is positioned directly above the container."
    },
    {
      "time": 6.0,
      "frame_description": "The robotic arm has placed the small object into the container.",
      "subject_poses": {
        "robotic_arm": "Retracted slightly, gripper empty."
      },
      "object_poses": {
        "container": "Stationary on the wooden surface, with the small object inside."
      },
      "contacts": {
        "robotic_arm": "Empty",
        "container": "None"
      },
      "important_visual_details": "The small object is inside the container."
    },
    {
      "time": 9.767,
      "frame_description": "The robotic arm is retracted, leaving the small object inside the container.",
      "subject_poses": {
        "robotic_arm": "Retracted, gripper empty."
      },
      "object_poses": {
        "container": "Stationary on the wooden surface, with the small object inside."
      },
      "contacts": {
        "robotic_arm": "Empty",
        "container": "None"
      },
      "important_visual_details": "The robotic arm is retracted, leaving the small object inside the container."
    }
  ],
  "contact_events": [
    {
      "time": 3.0,
      "subject_id": "robotic_arm",
      "object_id": "container",
      "contact_type": "Gripper makes contact with the container.",
      "contact_location": "Above the container.",
      "grasp_or_support_quality": "The gripper is holding a small object.",
      "release_or_transfer": "The small object is transferred from the gripper to the container.",
      "visible_evidence": "The small object is placed into the container."
    }
  ],
  "physics_and_affordances": {
    "support_surfaces": "The wooden surface supports the container.",
    "gravity_and_stability": "The container is stable on the wooden surface.",
    "force_or_constraint_reasoning": "The robotic arm applies a downward force to place the object into the container.",
    "collision_or_near_collision": "The robotic arm makes contact with the container.",
    "object_permanence": "The small object is transferred from the gripper to the container.",
    "task_goal": "The robotic arm is placing a small object into the container.",
    "failure_modes": "The robotic arm could fail to place the object into the container if it does not apply enough force or if the object is too large.",
    "next_action_prediction": "The robotic arm will likely retract further or move to a different position."
  },
  "appearance_fidelity": {
    "colors": "The robotic arm is white, the container is dark, and the background is a mix of dark and green colors.",
    "textures": "The wooden surface has a textured mat, and the background walls have a textured pattern.",
    "lighting_and_shadows": "The lighting is dim, with a slight green tint, and there are soft shadows cast by the robotic arm and the container.",
    "robot_morphology": "The robotic arm is a multi-jointed arm with a gripper at the end.",
    "object_fine_details": "The container has a flat top, and the robotic arm has a green light on its base and a blue light on its end effector.",
    "background_fine_details": "The background walls have a textured pattern, and the wooden surface has a textured mat."
  },
  "text_and_labels": [],
  "audio_description": "None",
  "generation_controls": {
    "shot_type": "Static medium shot",
    "camera_motion_instruction": "Static",
    "motion_style": "Realistic",
    "duration_instruction": "9.767 seconds",
    "spatial_consistency_requirements": "The robotic arm and the container must remain in the same relative positions throughout the video.",
    "negative_constraints": "The robotic arm must not collide with the container or the background.",
    "recommended_caption_style": "Descriptive"
  },
  "uncertainties": [
    "The exact nature of the small object being manipulated is unclear.",
    "The exact material of the container is unclear."
  ],
  "comprehensive_reproduction_caption": "A white robotic arm with a gripper carefully places a small object into a dark cylindrical container on a wooden surface. The robotic arm is positioned on the left side of the frame, extending towards the center where the container is located. The background consists of a dark, textured wall on the left and a green, patterned wall on the right. The lighting is dim, with a slight green tint. The robotic arm moves downwards and slightly forward to position the small object above the container, then places the object into the container and retracts slightly. The container is stationary on the wooden surface, and the robotic arm has a green light on its base and a blue light on its end effector."
}

rovid_003

top

Qwen 3.5 9B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 9B · 79.3s · 2026-06-10T17:44:51+0000

Short Read

At the beginning, the robot arm holds a brush over a pink bowl. In the early phase, the gripper dips the brush into the liquid. By the middle, the arm lifts the brush and moves it towards the bread. Towards the end, the robot applies the liquid to the surface of the bread slices in the tray.

Training Caption Field

A robotic arm with a black gripper is positioned over a stainless steel kitchen counter next to a black microwave. The robot holds a wooden pastry brush and dips it into a pink bowl containing a reddish-orange liquid. After saturating the brush, the robot moves it to a rectangular aluminum tray holding two slices of bread and begins to brush the liquid onto the bread surface. The scene is brightly lit with reflections visible on the metallic surfaces, and the camera remains static throughout the manipulation task.

Full output JSON/text
{
  "scene_imagination": "A high-angle, static shot captures a stainless steel kitchen counter where a black microwave oven sits on the left. To the right of the microwave is a rectangular aluminum foil tray containing two slices of bread. Further right is a pink bowl filled with a reddish-orange liquid. A robotic arm with a black gripper enters from the right, holding a wooden pastry brush, and dips it into the bowl before moving to the bread.",
  "temporal_caption": "At the beginning, the robot arm holds a brush over a pink bowl. In the early phase, the gripper dips the brush into the liquid. By the middle, the arm lifts the brush and moves it towards the bread. Towards the end, the robot applies the liquid to the surface of the bread slices in the tray.",
  "audio_description": null,
  "subjects": [
    {
      "description": "Robot arm and gripper",
      "appearance_details": "Black mechanical arm with visible joints and a two-fingered gripper holding a wooden brush.",
      "relationship": "Manipulator of the brush and bread.",
      "location": "Right side of the frame, moving towards the center.",
      "relative_size": "Large compared to the bread slices.",
      "orientation": "Angled downwards towards the tray.",
      "pose": "Extended arm, gripper holding the brush.",
      "action": "Dipping brush in liquid and brushing bread.",
      "state_changes": "Moves from holding brush over bowl to brushing bread."
    },
    {
      "description": "Bread slices",
      "appearance_details": "Two round slices of bread in a foil tray.",
      "relationship": "Object being manipulated.",
      "location": "Center-right of the counter.",
      "relative_size": "Small, standard bread size.",
      "orientation": "Flat, lying in the tray.",
      "pose": "Stationary.",
      "action": "Being brushed with liquid.",
      "state_changes": "Surface becomes wet with liquid."
    },
    {
      "description": "Pink bowl",
      "appearance_details": "Small, round bowl containing reddish-orange liquid.",
      "relationship": "Source of liquid.",
      "location": "Right side of the counter.",
      "relative_size": "Small.",
      "orientation": "Stationary on the counter.",
      "pose": "Stationary.",
      "action": "Being dipped into by the brush.",
      "state_changes": "Liquid level slightly decreases."
    }
  ],
  "background_setting": "A modern kitchen environment featuring a stainless steel countertop. Behind the counter is a wooden surface with various items including a bottle of oil, a carton of milk, and a plant. Orange chairs are visible in the far background.",
  "lighting": {
    "conditions": "Bright, indoor artificial lighting.",
    "direction": "Overhead, creating reflections on the metallic surfaces.",
    "shadows": "Soft shadows cast by the objects on the counter.",
    "illumination_effect": "High contrast on the shiny metal and glass surfaces."
  },
  "aesthetics": {
    "composition": "Centered composition with the microwave on the left and the action on the right.",
    "color_scheme": "Neutral tones of steel and black, accented by the pink bowl and orange chairs.",
    "mood_atmosphere": "Clinical, functional, and precise.",
    "patterns": "None significant."
  },
  "cinematography": {
    "framing": "Medium shot, capturing the workspace and objects.",
    "camera_angle": "High angle, looking down at the counter.",
    "depth_of_field": "Deep focus, keeping all objects sharp.",
    "focus": "Sharp focus on the robot arm and bread.",
    "lens_focal_length": "Standard lens, likely 35mm equivalent.",
    "camera_motion": "Static."
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "The robot is performing a cooking task, specifically brushing a liquid (likely egg wash or sauce) onto bread slices, likely in preparation for toasting or baking.",
  "actions": [
    {
      "time": "0.0s - 1.0s",
      "description": "The robot arm holds the brush over the pink bowl.",
      "actor": "Robot arm",
      "object": "Brush, Bowl",
      "contact_event": "Brush hovers near bowl.",
      "state_change": "None.",
      "evidence": "Visual position of brush relative to bowl."
    },
    {
      "time": "1.0s - 2.0s",
      "description": "The robot dips the brush into the liquid.",
      "actor": "Robot arm",
      "object": "Brush, Liquid",
      "contact_event": "Brush touches liquid surface.",
      "state_change": "Brush becomes saturated with liquid.",
      "evidence": "Visual change in brush appearance and liquid level."
    },
    {
      "time": "2.0s - 4.0s",
      "description": "The robot moves the brush towards the bread.",
      "actor": "Robot arm",
      "object": "Brush, Bread",
      "contact_event": "Brush approaches bread.",
      "state_change": "Brush moves from bowl to tray.",
      "evidence": "Motion of the arm and brush."
    },
    {
      "time": "4.0s - 6.0s",
      "description": "The robot brushes the bread slices.",
      "actor": "Robot arm",
      "object": "Brush, Bread",
      "contact_event": "Brush contacts bread surface.",
      "state_change": "Bread surface becomes wet.",
      "evidence": "Visual change on bread surface and brush motion."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Unidentifiable labels on microwave and milk carton.",
      "category": "Product packaging",
      "appearance": "Small, blurry text on white and blue labels.",
      "spatial": "On the top of the microwave and on the milk carton in the background.",
      "spatial_temporal": "Static throughout the video.",
      "context": "Commercial product labels."
    }
  ],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "0.0s - 2.0s",
      "description": "The robot prepares to brush the bread by dipping the brush into the liquid.",
      "key_changes": "Brush enters liquid.",
      "camera": "Static.",
      "robot_motion": "Arm extends and retracts slightly.",
      "object_motion": "Bowl remains stationary.",
      "physical_logic": "Robot grasps brush, moves to liquid source, saturates tool."
    },
    {
      "segment_index": 2,
      "time_range": "2.0s - 6.0s",
      "description": "The robot applies the liquid to the bread slices.",
      "key_changes": "Brush contacts bread, liquid transfers.",
      "camera": "Static.",
      "robot_motion": "Arm moves laterally and vertically over the tray.",
      "object_motion": "Bread remains stationary, surface changes.",
      "physical_logic": "Robot transfers liquid from tool to target object."
    }
  ],
  "transitions": [
    "Dipping action to brushing action."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "6.300s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Dual-arm robotic system (one arm visible manipulating, one idle on left).",
    "end_effector": "Two-fingered gripper holding a wooden pastry brush.",
    "manipulated_objects": "Pastry brush, bread slices.",
    "task_goal": "Brush liquid onto bread slices.",
    "initial_state": "Robot holding brush over bowl.",
    "terminal_state": "Robot brushing the bread.",
    "affordances": "Gripper provides stable hold for brush; brush provides surface contact for liquid transfer.",
    "failure_modes": "None observed."
  },
  "physical_reasoning": {
    "contact_and_grasp": "Gripper securely holds the brush handle.",
    "support_and_stability": "Bowl and tray are stable on the counter.",
    "force_gravity_friction": "Gravity pulls liquid down; friction allows brush to hold liquid.",
    "causal_chain": "Dipping -> Saturation -> Contact -> Transfer.",
    "object_permanence": "All objects remain present throughout the clip.",
    "next_action_prediction": "Robot may continue brushing or move the tray to the microwave."
  },
  "uncertainties": [
    "Exact type of liquid in the bowl."
  ],
  "comprehensive_t2v_caption": "A robotic arm with a black gripper is positioned over a stainless steel kitchen counter next to a black microwave. The robot holds a wooden pastry brush and dips it into a pink bowl containing a reddish-orange liquid. After saturating the brush, the robot moves it to a rectangular aluminum tray holding two slices of bread and begins to brush the liquid onto the bread surface. The scene is brightly lit with reflections visible on the metallic surfaces, and the camera remains static throughout the manipulation task."
}

Qwen 3.5 27B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 27B · 202.0s · 2026-06-10T18:05:27+0000

Short Read

At the beginning, the robotic arm is positioned over the pink bowl, holding a wooden spatula. In the early phase, the arm lifts the spatula, which has a yellow food item attached to it, out of the bowl. During the middle phase, the arm moves the spatula over to the silver foil tray. In the late phase, the arm carefully places the food item onto the tray next to the other item. By the end, the arm retracts slightly, leaving the two food items side-by-side on the tray.

Training Caption Field

A black robotic arm with a multi-jointed gripper operates on a stainless steel kitchen counter. Initially, the arm holds a wooden spatula over a pink bowl containing a reddish liquid. The spatula carries a round, yellow food item. The arm lifts the spatula, moving the food item out of the bowl and over to a silver foil tray positioned centrally. The tray already holds one similar food item. The robot carefully maneuvers the spatula to place the second food item onto the tray next to the first one. Once placed, the arm retracts slightly. To the left of the tray sits a black microwave oven. The background includes a wooden counter with various bottles and a carton. The scene is brightly lit, highlighting the reflective surfaces of the counter and tray.

Full output JSON/text
{
  "scene_imagination": "A static, high-angle shot captures a kitchen countertop scene. On the left sits a black microwave oven. In the center, a silver foil tray holds two round, yellow food items, possibly potato slices or patties. To the right, a pink bowl contains a reddish liquid. A black robotic arm with a multi-jointed gripper enters from the right, holding a wooden spatula. The background features a wooden counter with various bottles and a carton, suggesting a domestic or lab kitchen environment. The lighting is bright and even, reflecting off the stainless steel surface.",
  "temporal_caption": "At the beginning, the robotic arm is positioned over the pink bowl, holding a wooden spatula. In the early phase, the arm lifts the spatula, which has a yellow food item attached to it, out of the bowl. During the middle phase, the arm moves the spatula over to the silver foil tray. In the late phase, the arm carefully places the food item onto the tray next to the other item. By the end, the arm retracts slightly, leaving the two food items side-by-side on the tray.",
  "audio_description": null,
  "subjects": [
    {
      "description": "A black robotic arm with a complex, multi-jointed gripper.",
      "appearance_details": "Black matte finish, metallic joints, blue indicator light on the forearm.",
      "relationship": "Manipulator interacting with food items.",
      "location": "Enters from the right side of the frame.",
      "relative_size": "Large, dominating the right foreground.",
      "orientation": "Facing left towards the tray.",
      "pose": "Arm extended, gripper holding a tool.",
      "action": "Transferring a food item from a bowl to a tray.",
      "state_changes": "Moves from holding position over bowl to placement position over tray."
    },
    {
      "description": "A wooden spatula.",
      "appearance_details": "Light brown wood, flat head.",
      "relationship": "Tool held by the robot.",
      "location": "Held by the robotic gripper.",
      "relative_size": "Small relative to the arm.",
      "orientation": "Angled downwards.",
      "pose": "Static relative to the gripper.",
      "action": "Used to transport a food item.",
      "state_changes": "Moves from bowl to tray."
    },
    {
      "description": "Two round, yellow food items.",
      "appearance_details": "Flat, circular, yellowish-brown color, resembling potato slices or patties.",
      "relationship": "Objects being manipulated.",
      "location": "One initially in the bowl (on spatula), one already on the tray.",
      "relative_size": "Small.",
      "orientation": "Flat.",
      "pose": "Resting on surfaces.",
      "action": "Being placed on the tray.",
      "state_changes": "One item moves from the spatula to the tray."
    },
    {
      "description": "A silver foil tray.",
      "appearance_details": "Rectangular, metallic, reflective.",
      "relationship": "Receptacle for the food.",
      "location": "Center of the countertop.",
      "relative_size": "Medium.",
      "orientation": "Flat on the counter.",
      "pose": "Static.",
      "action": "Receiving a food item.",
      "state_changes": "Goes from holding one item to holding two."
    },
    {
      "description": "A pink bowl.",
      "appearance_details": "Round, pink ceramic or plastic, containing reddish liquid.",
      "relationship": "Source of the food item.",
      "location": "Right of the tray.",
      "relative_size": "Medium.",
      "orientation": "Static.",
      "pose": "Static.",
      "action": "Source of the food item.",
      "state_changes": "Loses one food item."
    }
  ],
  "background_setting": "A modern kitchen or laboratory setting with a stainless steel countertop. In the background, there is a wooden counter with various items including a carton of milk or cream, a bottle of oil, and a dark bottle. Chairs are visible further back.",
  "lighting": {
    "conditions": "Bright, artificial indoor lighting.",
    "direction": "Overhead and slightly from the front.",
    "shadows": "Soft shadows cast by the robot arm and objects onto the counter.",
    "illumination_effect": "Creates reflections on the microwave and foil tray."
  },
  "aesthetics": {
    "composition": "Balanced with the microwave on the left and the action on the right.",
    "color_scheme": "Neutral tones (black, silver, wood) with pops of color from the pink bowl and yellow food.",
    "mood_atmosphere": "Clinical, precise, and functional.",
    "patterns": "Geometric shapes of the microwave and tray contrast with the organic shapes of the food."
  },
  "cinematography": {
    "framing": "Medium shot, capturing the workspace.",
    "camera_angle": "High angle, looking down at the counter.",
    "depth_of_field": "Deep, keeping both foreground and background relatively sharp.",
    "focus": "Sharp focus on the robot arm and the tray.",
    "lens_focal_length": "Standard to wide-angle.",
    "camera_motion": "Static."
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "The inferred task is food preparation, specifically arranging food items on a tray, likely for cooking in the adjacent microwave.",
  "actions": [
    {
      "time": "00:00 - 00:02",
      "description": "The robotic arm lifts a wooden spatula with a food item out of a pink bowl.",
      "actor": "Robotic arm",
      "object": "Spatula and food item",
      "contact_event": "Gripper holding spatula, spatula holding food.",
      "state_change": "Food item moves from bowl to air.",
      "evidence": "Visual movement of the arm and spatula."
    },
    {
      "time": "00:02 - 00:05",
      "description": "The robotic arm moves the spatula over the foil tray and places the food item down.",
      "actor": "Robotic arm",
      "object": "Spatula and food item",
      "contact_event": "Food item released onto tray.",
      "state_change": "Food item moves from spatula to tray.",
      "evidence": "Visual placement of the item next to the other one."
    },
    {
      "time": "00:05 - 00:06",
      "description": "The robotic arm retracts slightly after placing the item.",
      "actor": "Robotic arm",
      "object": "Spatula",
      "contact_event": "None.",
      "state_change": "Arm position changes.",
      "evidence": "Visual retraction of the arm."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Unknown text on microwave label",
      "category": "Label",
      "appearance": "White sticker with blue and black text on the top right of the microwave.",
      "spatial": "Top right of the microwave.",
      "spatial_temporal": "Static throughout.",
      "context": "Product information."
    },
    {
      "text": "Unknown text on carton",
      "category": "Packaging",
      "appearance": "Blue and white carton in the background.",
      "spatial": "Background on the wooden counter.",
      "spatial_temporal": "Static throughout.",
      "context": "Food product packaging."
    }
  ],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:02",
      "description": "Robot arm lifts spatula with food from bowl.",
      "key_changes": "Food item leaves the bowl.",
      "camera": "Static.",
      "robot_motion": "Upward and leftward movement.",
      "object_motion": "Food item moves with spatula.",
      "physical_logic": "Gravity holds food on spatula."
    },
    {
      "segment_index": 2,
      "time_range": "00:02 - 00:05",
      "description": "Robot arm positions spatula over tray and releases food.",
      "key_changes": "Food item is placed on the tray.",
      "camera": "Static.",
      "robot_motion": "Precise placement movement.",
      "object_motion": "Food item settles on tray.",
      "physical_logic": "Friction and gravity allow food to stay on tray."
    },
    {
      "segment_index": 3,
      "time_range": "00:05 - 00:06",
      "description": "Robot arm retracts.",
      "key_changes": "Task completion, arm moves away.",
      "camera": "Static.",
      "robot_motion": "Retraction.",
      "object_motion": "None.",
      "physical_logic": "Clearance for next action."
    }
  ],
  "transitions": [
    "Lifting food from bowl",
    "Transferring food to tray",
    "Releasing food on tray"
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16:9",
  "duration": "6.300s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Black articulated robotic arm with a multi-fingered gripper.",
    "end_effector": "Multi-jointed gripper holding a wooden spatula.",
    "manipulated_objects": "Wooden spatula, yellow food items.",
    "task_goal": "Transfer a food item from a bowl to a tray.",
    "initial_state": "Food item in bowl, one item on tray.",
    "terminal_state": "Two food items on tray.",
    "affordances": "Gripping, lifting, placing.",
    "failure_modes": "Dropping the food, inaccurate placement."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper firmly holds the handle of the spatula. The spatula supports the food item via friction and surface contact.",
    "support_and_stability": "The tray provides a stable surface for the food items. The counter supports all objects.",
    "force_gravity_friction": "Gravity pulls the food down; friction between the spatula and food prevents it from sliding off during transport. Friction between food and tray keeps it in place after release.",
    "causal_chain": "Arm moves -> Spatula moves -> Food moves -> Food released -> Food rests on tray.",
    "object_permanence": "Objects remain visible and consistent throughout the clip.",
    "next_action_prediction": "The robot might repeat the action for more items or move the tray towards the microwave."
  },
  "uncertainties": [
    "The exact nature of the food items is unclear (potato slices, patties, etc.).",
    "The contents of the pink bowl besides the liquid are not fully visible."
  ],
  "comprehensive_t2v_caption": "A black robotic arm with a multi-jointed gripper operates on a stainless steel kitchen counter. Initially, the arm holds a wooden spatula over a pink bowl containing a reddish liquid. The spatula carries a round, yellow food item. The arm lifts the spatula, moving the food item out of the bowl and over to a silver foil tray positioned centrally. The tray already holds one similar food item. The robot carefully maneuvers the spatula to place the second food item onto the tray next to the first one. Once placed, the arm retracts slightly. To the left of the tray sits a black microwave oven. The background includes a wooden counter with various bottles and a carton. The scene is brightly lit, highlighting the reflective surfaces of the counter and tray."
}

Cosmos 3 Nano (16B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Nano (16B) · 14.1s · 2026-06-10T18:36:04+0000

Short Read

The robot arm moves the wooden brush towards the baking tray and begins to brush the surface of one of the potato halves. The arm then moves the brush to the other potato half and repeats the brushing motion. The robot arm remains in this position for the duration of the video.

Training Caption Field

A robot arm with a gripper holding a wooden brush is seen moving towards a baking tray containing two potato halves. The robot arm brushes the surface of one of the potato halves, then moves the brush to the other potato half and repeats the brushing motion. The scene takes place in a kitchen with a black oven on the left, a pink bowl containing oil on the right, and a wooden shelf in the background with a bottle of oil, a bottle of sauce, and a plant. The lighting is bright and even, and the camera angle is high, providing a clear view of the robot arm's actions.

Full output JSON/text
{
  "scene_imagination": "A kitchen countertop scene with a black oven on the left, a baking tray with two potato halves in the center, and a pink bowl with oil on the right. A robot arm with a gripper is visible on the right, holding a wooden brush. In the background, there's a wooden shelf with a bottle of oil, a bottle of sauce, and a plant. The setting appears to be a modern, well-lit kitchen with white tiled walls.",
  "temporal_caption": "The robot arm moves the wooden brush towards the baking tray and begins to brush the surface of one of the potato halves. The arm then moves the brush to the other potato half and repeats the brushing motion. The robot arm remains in this position for the duration of the video.",
  "audio_description": "The sound of the robot arm moving and the brush brushing against the potato.",
  "subjects": [
    {
      "description": "A black oven with a silver handle and control panel.",
      "appearance_details": "The oven has a glossy black finish and a silver handle. There are stickers on the top surface.",
      "relationship": "Located to the left of the baking tray.",
      "location": "Left side of the frame.",
      "relative_size": "Large within frame.",
      "orientation": "Facing the camera.",
      "pose": "Stationary.",
      "action": "Remains stationary throughout the video.",
      "state_changes": "No significant change."
    },
    {
      "description": "A baking tray containing two potato halves.",
      "appearance_details": "The tray is metallic and rectangular. The potatoes are cut in half and placed cut-side up.",
      "relationship": "Located in the center of the frame, between the oven and the robot arm.",
      "location": "Center of the frame.",
      "relative_size": "Medium within frame.",
      "orientation": "Facing the camera.",
      "pose": "Stationary.",
      "action": "Remains stationary throughout the video.",
      "state_changes": "No significant change."
    },
    {
      "description": "A pink bowl containing oil.",
      "appearance_details": "The bowl is pink and round. It contains a red liquid, likely oil.",
      "relationship": "Located to the right of the baking tray.",
      "location": "Right side of the frame.",
      "relative_size": "Small within frame.",
      "orientation": "Facing the camera.",
      "pose": "Stationary.",
      "action": "Remains stationary throughout the video.",
      "state_changes": "No significant change."
    },
    {
      "description": "A robot arm with a gripper holding a wooden brush.",
      "appearance_details": "The robot arm is black and metallic. The gripper is black and has a mechanical design. The brush is wooden with a handle.",
      "relationship": "Located on the right side of the frame, interacting with the baking tray.",
      "location": "Right side of the frame.",
      "relative_size": "Medium within frame.",
      "orientation": "Facing the baking tray.",
      "pose": "Moving towards the baking tray.",
      "action": "Moves the brush towards the baking tray and begins to brush the surface of one of the potato halves.",
      "state_changes": "Moves from the right side of the frame towards the center, then remains in position."
    }
  ],
  "background_setting": "A kitchen countertop with a black oven on the left, a baking tray in the center, and a pink bowl on the right. In the background, there is a wooden shelf with a bottle of oil, a bottle of sauce, and a plant. The walls are white tiled.",
  "lighting": {
    "conditions": "Bright indoor lighting.",
    "direction": "Top-lit.",
    "shadows": "Soft shadows on the countertop.",
    "illumination_effect": "Even illumination across the scene."
  },
  "aesthetics": {
    "composition": "Centered framing with the baking tray in the middle.",
    "color_scheme": "Neutral tones with pops of color from the pink bowl and the red liquid in the bowl.",
    "mood_atmosphere": "Functional and focused.",
    "patterns": "Grid pattern on the tiled walls."
  },
  "cinematography": {
    "camera_angle": "High angle.",
    "framing": "Wide shot.",
    "depth_of_field": "Deep.",
    "focus": "Sharp focus on the baking tray and the robot arm.",
    "lens_focal_length": "Standard.",
    "camera_motion": "Static."
  },
  "style_medium": "Robotics video.",
  "artistic_style": "Real-world robot manipulation footage.",
  "context": "A robot is being used to prepare food by brushing oil onto potatoes before baking.",
  "actions": [
    {
      "time": "0:00-0:02",
      "description": "The robot arm moves the brush towards the baking tray.",
      "actor": "Robot arm.",
      "object": "Brush.",
      "contact_event": "The brush makes contact with the potato half.",
      "state_change": "The brush moves from the right side of the frame towards the center."
    },
    {
      "time": "0:02-0:06",
      "description": "The robot arm brushes the surface of one of the potato halves.",
      "actor": "Robot arm.",
      "object": "Brush.",
      "contact_event": "The brush rubs against the surface of the potato half.",
      "state_change": "The potato half is coated with oil."
    },
    {
      "time": "0:06-0:06",
      "description": "The robot arm moves the brush to the other potato half and repeats the brushing motion.",
      "actor": "Robot arm.",
      "object": "Brush.",
      "contact_event": "The brush makes contact with the other potato half.",
      "state_change": "The other potato half is coated with oil."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Unreadable text on stickers.",
      "category": "physical_in_scene",
      "appearance": "Small, rectangular stickers with text.",
      "spatial": "On the top surface of the oven.",
      "spatial_temporal": "Static throughout the video.",
      "context": "Likely product information or safety warnings."
    }
  ],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0:00-0:06",
      "description": "The robot arm moves the brush towards the baking tray and begins to brush the surface of one of the potato halves. The arm then moves the brush to the other potato half and repeats the brushing motion.",
      "key_changes": "The robot arm moves from the right side of the frame towards the center, then remains in position.",
      "camera": "Static high angle shot.",
      "robot_motion": "The robot arm moves the brush towards the baking tray and begins to brush the surface of one of the potato halves. The arm then moves the brush to the other potato half and repeats the brushing motion.",
      "object_motion": "The brush moves from the right side of the frame towards the center, then remains in position.",
      "physical_reasoning": "The robot arm is designed to manipulate objects in its environment. It uses the brush to apply oil to the potatoes before baking."
    }
  ],
  "transitions": [],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "0:06",
  "fps": 30,
  "robotics": {
    "embodiment": "Manipulation robot arm.",
    "end_effector": "Gripper holding a wooden brush.",
    "manipulated_objects": "Brush, potato halves.",
    "task_goal": "Apply oil to the potatoes before baking.",
    "initial_state": "The robot arm is positioned on the right side of the frame, holding the brush.",
    "terminal_state": "The robot arm is positioned over the baking tray, having brushed the surface of both potato halves.",
    "affordances": "The brush is designed to apply oil to the potatoes.",
    "failure_modes": "The robot arm may fail to properly coat the potatoes with oil."
  },
  "physical_reasoning": "The robot arm is designed to manipulate objects in its environment. It uses the brush to apply oil to the potatoes before baking. The brush is designed to apply oil to the potatoes. The robot arm moves the brush towards the baking tray and begins to brush the surface of one of the potato halves. The arm then moves the brush to the other potato half and repeats the brushing motion.",
  "uncertainties": [],
  "comprehensive_t2v_caption": "A robot arm with a gripper holding a wooden brush is seen moving towards a baking tray containing two potato halves. The robot arm brushes the surface of one of the potato halves, then moves the brush to the other potato half and repeats the brushing motion. The scene takes place in a kitchen with a black oven on the left, a pink bowl containing oil on the right, and a wooden shelf in the background with a bottle of oil, a bottle of sauce, and a plant. The lighting is bright and even, and the camera angle is high, providing a clear view of the robot arm's actions."
}

Cosmos 3 Super (64B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Super (64B) · 15.0s · 2026-06-10T16:11:20+0000

Short Read

The robotic arm dips the brush into the pink bowl containing orange liquid, then moves the brush over the two potato halves on the baking tray, applying the liquid.

Training Caption Field

A robotic arm is shown in a kitchen setting, preparing to cook potatoes. The arm holds a brush and dips it into a pink bowl filled with orange liquid. It then moves the brush over two potato halves placed on a baking tray, applying the liquid evenly. The scene is well-lit, with a black oven on the left and various kitchen items on a wooden shelf in the background. The robotic arm's movements are precise and controlled, demonstrating its capability for food preparation tasks.

Full output JSON/text
{
  "scene_imagination": "A kitchen counter setup with a black oven on the left, a baking tray with two potato halves in the center, and a pink bowl with orange liquid on the right. A robotic arm with a gripper is positioned near the bowl, holding a brush. The background includes a wooden shelf with bottles, a plant, and orange chairs.",
  "temporal_caption": "The robotic arm dips the brush into the pink bowl containing orange liquid, then moves the brush over the two potato halves on the baking tray, applying the liquid.",
  "audio_description": "Quiet ambient kitchen sounds with subtle mechanical movements from the robotic arm.",
  "subjects": [
    {
      "description": "A black oven with a silver handle and control panel on the right side.",
      "appearance_details": "Glossy black finish, silver handle, digital control panel.",
      "relationship": "Stationary on the left side of the counter.",
      "location": "Left side of the frame.",
      "relative_size": "Large",
      "orientation": "Facing forward",
      "pose": "Stationary",
      "action": "None",
      "state_changes": "No significant change"
    },
    {
      "description": "A baking tray with two potato halves placed on it.",
      "appearance_details": "Silver metal tray, two round potato halves.",
      "relationship": "Center of the counter, in front of the oven.",
      "location": "Center of the frame.",
      "relative_size": "Medium",
      "orientation": "Facing up",
      "pose": "Stationary",
      "action": "None",
      "state_changes": "No significant change"
    },
    {
      "description": "A pink bowl containing orange liquid.",
      "appearance_details": "Round pink bowl, orange liquid inside.",
      "relationship": "Right side of the counter, near the robotic arm.",
      "location": "Right side of the frame.",
      "relative_size": "Medium",
      "orientation": "Facing up",
      "pose": "Stationary",
      "action": "None",
      "state_changes": "No significant change"
    },
    {
      "description": "A robotic arm with a gripper holding a brush.",
      "appearance_details": "Black and silver mechanical arm, gripper holding a brush.",
      "relationship": "Positioned near the pink bowl, interacting with the brush and potatoes.",
      "location": "Right side of the frame.",
      "relative_size": "Medium",
      "orientation": "Facing left",
      "pose": "Moving",
      "action": "Dips the brush into the bowl, then applies the liquid to the potatoes.",
      "state_changes": "Moves from the bowl to the potatoes."
    }
  ],
  "background_setting": "A modern kitchen counter with a wooden shelf above, holding bottles, a plant, and a box. Orange chairs are visible in the background.",
  "lighting": {
    "conditions": "Bright indoor lighting",
    "direction": "Top-lit",
    "shadows": "Soft shadows on the counter",
    "illumination_effect": "Even illumination across the scene"
  },
  "aesthetics": {
    "composition": "Centered on the baking tray, with the oven on the left and the bowl on the right.",
    "color_scheme": "Neutral tones with pops of orange and pink.",
    "mood_atmosphere": "Functional and focused",
    "patterns": "None"
  },
  "cinematography": {
    "camera_motion": "Static",
    "framing": "Wide shot",
    "camera_angle": "High angle",
    "depth_of_field": "Deep",
    "focus": "Sharp focus on the counter and objects",
    "lens_focal_length": "Wide-angle"
  },
  "style_medium": "Real-world robot manipulation footage",
  "artistic_style": "Realistic",
  "context": "A robotic arm is preparing food by applying liquid to potatoes before baking.",
  "actions": [
    {
      "time": "0-2s",
      "description": "The robotic arm dips the brush into the pink bowl containing orange liquid.",
      "actor": "Robotic arm",
      "object": "Brush and pink bowl",
      "contact_event": "Brush touches the liquid in the bowl.",
      "state_change": "Brush becomes coated with liquid."
    },
    {
      "time": "2-6s",
      "description": "The robotic arm moves the brush over the two potato halves on the baking tray, applying the liquid.",
      "actor": "Robotic arm",
      "object": "Brush and potato halves",
      "contact_event": "Brush touches the potatoes.",
      "state_change": "Potatoes become coated with liquid."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Unreadable text on the oven control panel",
      "category": "physical_in_scene",
      "appearance": "Small white text on a black background",
      "spatial": "Right side of the oven",
      "spatial_temporal": "Static",
      "context": "Oven controls"
    }
  ],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0-2s",
      "description": "The robotic arm dips the brush into the bowl.",
      "key_changes": "Brush becomes coated with liquid.",
      "camera": "Static wide shot",
      "robot_motion": "Arm moves downward into the bowl.",
      "object_motion": "Brush dips into the liquid.",
      "physical_logic": "The brush is dipped to pick up the liquid."
    },
    {
      "segment_index": 1,
      "time_range": "2-6s",
      "description": "The robotic arm applies the liquid to the potatoes.",
      "key_changes": "Potatoes become coated with liquid.",
      "camera": "Static wide shot",
      "robot_motion": "Arm moves horizontally over the potatoes.",
      "object_motion": "Brush moves over the potatoes.",
      "physical_logic": "The brush applies the liquid to the potatoes."
    }
  ],
  "transitions": [
    "Continuous action from dipping the brush to applying the liquid."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16,9",
  "duration": "6.3s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Robotic arm with gripper",
    "end_effector": "Gripper holding a brush",
    "manipulated_objects": [
      "Brush",
      "Pink bowl",
      "Potato halves"
    ],
    "task_goal": "Apply liquid to potatoes",
    "initial_state": "Brush held above the bowl",
    "terminal_state": "Brush applied to potatoes",
    "affordances": "Dipping, brushing",
    "failure_modes": "None observed"
  },
  "physical_reasoning": "The robotic arm successfully dips the brush into the liquid and applies it to the potatoes, demonstrating precise control and coordination.",
  "uncertainties": [],
  "comprehensive_t2v_caption": "A robotic arm is shown in a kitchen setting, preparing to cook potatoes. The arm holds a brush and dips it into a pink bowl filled with orange liquid. It then moves the brush over two potato halves placed on a baking tray, applying the liquid evenly. The scene is well-lit, with a black oven on the left and various kitchen items on a wooden shelf in the background. The robotic arm's movements are precise and controlled, demonstrating its capability for food preparation tasks."
}

Cosmos 3 Super (64B) / p09: Reproduction schema

ok
p09: Reproduction schema · Cosmos 3 Super (64B) · 21.8s · 2026-06-10T18:55:36+0000

Short Read

A robot arm uses a brush to apply a liquid from a pink bowl onto two potato slices resting on a foil-lined baking tray, with an oven and other kitchen items visible in the background.

Training Caption Field

A robot arm uses a brush to apply a liquid from a pink bowl onto two potato slices resting on a foil-lined baking tray, with an oven and other kitchen items visible in the background.

Full output JSON/text
{
  "schema_version": "p09_reproduction_v1",
  "one_sentence_summary": "A robot arm uses a brush to apply a liquid from a pink bowl onto two potato slices resting on a foil-lined baking tray, with an oven and other kitchen items visible in the background.",
  "video_metadata": {
    "resolution": {
      "W": 1152,
      "H": 816
    },
    "aspect_ratio": "16,9",
    "duration_sec": 6.3,
    "fps": 30,
    "frame_count": 189
  },
  "camera": {
    "viewpoint": "top-down",
    "framing": "medium",
    "camera_angle": "high angle",
    "camera_motion": "static",
    "estimated_height_or_distance": "1.5 meters",
    "lens_or_distortion": "none",
    "crop_boundaries": "none",
    "stability": "stable"
  },
  "scene_layout": {
    "coordinate_frame": "2d",
    "background": "kitchen counter with an oven, a pink bowl, a bottle of oil, and a carton of milk",
    "surface": "metallic countertop",
    "lighting": "bright indoor lighting",
    "global_spatial_map": "The scene is a kitchen counter with an oven on the left, a baking tray with potato slices in the center, a pink bowl with a brush on the right, and a bottle of oil and a carton of milk in the background.",
    "occlusions": "none"
  },
  "subjects": [
    {
      "id": "robot_arm",
      "category": "robot arm",
      "appearance": "black and silver metallic with a gripper at the end",
      "visible_parts": "arm, gripper",
      "initial_pose": "gripper holding a brush over a pink bowl",
      "final_pose": "gripper holding a brush over a baking tray",
      "workspace_location": "right side of the frame",
      "motion_path": "moves from the pink bowl to the baking tray",
      "end_effector": "gripper holding a brush",
      "uncertainties": "none"
    }
  ],
  "objects": [
    {
      "id": "baking_tray",
      "category": "baking tray",
      "appearance": "metallic with a foil liner",
      "label_or_text": "none",
      "material": "metal",
      "shape": "rectangular",
      "size_relative_to_frame": "medium",
      "initial_pose": "center of the frame",
      "final_pose": "center of the frame",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "potato_slices",
      "category": "potato slices",
      "appearance": "golden brown",
      "label_or_text": "none",
      "material": "food",
      "shape": "round",
      "size_relative_to_frame": "small",
      "initial_pose": "on the baking tray",
      "final_pose": "on the baking tray",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "pink_bowl",
      "category": "bowl",
      "appearance": "pink plastic",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "round",
      "size_relative_to_frame": "small",
      "initial_pose": "right side of the frame",
      "final_pose": "right side of the frame",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "brush",
      "category": "brush",
      "appearance": "wooden handle with bristles",
      "label_or_text": "none",
      "material": "wood and bristles",
      "shape": "cylindrical",
      "size_relative_to_frame": "small",
      "initial_pose": "in the pink bowl",
      "final_pose": "over the baking tray",
      "state_changes": "moved from the pink bowl to the baking tray",
      "uncertainties": "none"
    },
    {
      "id": "oven",
      "category": "oven",
      "appearance": "black with a silver handle",
      "label_or_text": "none",
      "material": "metal",
      "shape": "rectangular",
      "size_relative_to_frame": "large",
      "initial_pose": "left side of the frame",
      "final_pose": "left side of the frame",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "oil_bottle",
      "category": "bottle",
      "appearance": "clear glass with a brown label",
      "label_or_text": "none",
      "material": "glass",
      "shape": "cylindrical",
      "size_relative_to_frame": "small",
      "initial_pose": "background right",
      "final_pose": "background right",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "milk_carton",
      "category": "carton",
      "appearance": "white with blue and red text",
      "label_or_text": "none",
      "material": "cardboard",
      "shape": "rectangular",
      "size_relative_to_frame": "small",
      "initial_pose": "background right",
      "final_pose": "background right",
      "state_changes": "none",
      "uncertainties": "none"
    }
  ],
  "initial_state": "The robot arm is holding a brush over a pink bowl containing a liquid. The baking tray with two potato slices is in the center of the frame. The oven is on the left, and the oil bottle and milk carton are in the background.",
  "final_state": "The robot arm has moved the brush from the pink bowl to the baking tray, applying the liquid to the potato slices.",
  "timeline": [
    {
      "time_range": "0-2s",
      "description": "The robot arm is holding a brush over a pink bowl containing a liquid.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "baking_tray": "center of the frame",
        "potato_slices": "on the baking tray",
        "pink_bowl": "right side of the frame",
        "brush": "in the pink bowl"
      },
      "robot_motion": "none",
      "object_motion": "none",
      "gripper_state": "holding the brush",
      "contact_state": "brush in contact with the liquid in the pink bowl",
      "spatial_changes": "none",
      "physical_causality": "none"
    },
    {
      "time_range": "2-6s",
      "description": "The robot arm moves the brush from the pink bowl to the baking tray, applying the liquid to the potato slices.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "baking_tray": "center of the frame",
        "potato_slices": "on the baking tray",
        "pink_bowl": "right side of the frame",
        "brush": "over the baking tray"
      },
      "robot_motion": "moving the brush from the pink bowl to the baking tray",
      "object_motion": "none",
      "gripper_state": "holding the brush",
      "contact_state": "brush in contact with the potato slices",
      "spatial_changes": "brush moves from the pink bowl to the baking tray",
      "physical_causality": "the brush applies the liquid to the potato slices"
    }
  ],
  "parallel_action_graph": {
    "simultaneous_actions": [],
    "dependencies": [],
    "ordering_constraints": []
  },
  "keyframes": [
    {
      "time": 0,
      "frame_description": "The robot arm is holding a brush over a pink bowl containing a liquid. The baking tray with two potato slices is in the center of the frame. The oven is on the left, and the oil bottle and milk carton are in the background.",
      "subject_poses": {
        "robot_arm": "gripper holding a brush over a pink bowl"
      },
      "object_poses": {
        "baking_tray": "center of the frame",
        "potato_slices": "on the baking tray",
        "pink_bowl": "right side of the frame",
        "brush": "in the pink bowl"
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "brush",
          "contact_type": "holding"
        },
        {
          "subject_id": "brush",
          "object_id": "pink_bowl",
          "contact_type": "in contact with the liquid"
        }
      ],
      "important_visual_details": []
    },
    {
      "time": 6,
      "frame_description": "The robot arm has moved the brush from the pink bowl to the baking tray, applying the liquid to the potato slices.",
      "subject_poses": {
        "robot_arm": "gripper holding a brush over the baking tray"
      },
      "object_poses": {
        "baking_tray": "center of the frame",
        "potato_slices": "on the baking tray",
        "pink_bowl": "right side of the frame",
        "brush": "over the baking tray"
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "brush",
          "contact_type": "holding"
        },
        {
          "subject_id": "brush",
          "object_id": "potato_slices",
          "contact_type": "in contact with the potato slices"
        }
      ],
      "important_visual_details": []
    }
  ],
  "contact_events": [
    {
      "time": 0,
      "subject_id": "robot_arm",
      "object_id": "brush",
      "contact_type": "holding",
      "contact_location": "gripper",
      "grasp_or_support_quality": "firm",
      "release_or_transfer": "none",
      "visible_evidence": "the brush is held by the gripper"
    },
    {
      "time": 0,
      "subject_id": "brush",
      "object_id": "pink_bowl",
      "contact_type": "in contact with the liquid",
      "contact_location": "inside the bowl",
      "grasp_or_support_quality": "none",
      "release_or_transfer": "none",
      "visible_evidence": "the brush is in the liquid"
    },
    {
      "time": 6,
      "subject_id": "robot_arm",
      "object_id": "brush",
      "contact_type": "holding",
      "contact_location": "gripper",
      "grasp_or_support_quality": "firm",
      "release_or_transfer": "none",
      "visible_evidence": "the brush is held by the gripper"
    },
    {
      "time": 6,
      "subject_id": "brush",
      "object_id": "potato_slices",
      "contact_type": "in contact with the potato slices",
      "contact_location": "on the baking tray",
      "grasp_or_support_quality": "none",
      "release_or_transfer": "none",
      "visible_evidence": "the brush is on the potato slices"
    }
  ],
  "physics_and_affordances": {
    "support_surfaces": [
      "baking tray",
      "pink bowl",
      "oven",
      "oil bottle",
      "milk carton"
    ],
    "gravity_and_stability": "stable",
    "force_or_constraint_reasoning": "none",
    "collision_or_near_collision": "none",
    "object_permanence": "none",
    "task_goal": "apply liquid to potato slices",
    "failure_modes": "none",
    "next_action_prediction": "none"
  },
  "appearance_fidelity": {
    "colors": [
      "black",
      "silver",
      "pink",
      "golden brown",
      "white",
      "blue",
      "red",
      "brown",
      "clear glass"
    ],
    "textures": [
      "metallic",
      "plastic",
      "wood",
      "food",
      "cardboard",
      "glass"
    ],
    "lighting_shadows": "bright indoor lighting",
    "robot_morphology": "black and silver metallic with a gripper at the end",
    "object_fine_details": [
      "baking tray with a foil liner",
      "two potato slices",
      "pink bowl with a brush",
      "oven with a silver handle",
      "oil bottle with a brown label",
      "milk carton with blue and red text"
    ],
    "background_fine_details": [
      "kitchen counter with an oven, a pink bowl, a bottle of oil, and a carton of milk"
    ]
  },
  "text_and_labels": [
    {
      "object_id": "milk_carton",
      "raw_text": "none",
      "legibility": "illegible",
      "appearance": "blue and red text on a white background",
      "spatial_location": "background right",
      "time_visible": "0-6s"
    }
  ],
  "audio_description": "none",
  "generation_controls": {
    "shot_type": "medium",
    "camera_motion_instruction": "static",
    "motion_style": "none",
    "duration_instruction": "6.3s",
    "spatial_consistency_requirements": "none",
    "negative_constraints": "none",
    "recommended_caption_style": "none"
  },
  "uncertainties": [],
  "comprehensive_reproduction_caption": "A robot arm uses a brush to apply a liquid from a pink bowl onto two potato slices resting on a foil-lined baking tray, with an oven and other kitchen items visible in the background."
}

rovid_004

top

Qwen 3.5 9B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 9B · 76.3s · 2026-06-10T17:46:08+0000

Short Read

At the beginning, the robot arm is stationary. In the early phase, the robot's gripper descends from the top right towards the grey earbuds. By the middle, the gripper makes contact with the earbuds and lifts them slightly. In the late phase, the robot moves the earbuds towards the orange book and places them on top of the book cover. At the end, the robot arm retracts upwards, leaving the earbuds resting on the book.

Training Caption Field

A white robotic arm with a green illuminated base sits on a light wooden table cluttered with various objects including a blue mug, a black sneaker, a spoon, a woven bracelet, an orange book, a black block, and grey earbuds. The robot's gripper descends from the top right, grasps the grey earbuds, lifts them, and moves them horizontally towards the orange book. The robot then places the earbuds on top of the book cover and retracts its arm upwards, completing the task.

Full output JSON/text
{
  "scene_imagination": "A white robotic arm with a green illuminated base is positioned on a light wooden table. The table surface is cluttered with various objects including a blue mug, a black sneaker, a spoon, a woven bracelet, an orange book with a colorful cover, a black rectangular block, a pair of grey earbuds, and a tape measure near the wall. The robot's end-effector is visible in the upper right quadrant, hovering over the scene.",
  "temporal_caption": "At the beginning, the robot arm is stationary. In the early phase, the robot's gripper descends from the top right towards the grey earbuds. By the middle, the gripper makes contact with the earbuds and lifts them slightly. In the late phase, the robot moves the earbuds towards the orange book and places them on top of the book cover. At the end, the robot arm retracts upwards, leaving the earbuds resting on the book.",
  "audio_description": null,
  "subjects": [
    {
      "description": "White robotic arm",
      "appearance_details": "Smooth white segments, black cables, green LED light on base",
      "relationship": "Manipulator",
      "location": "Left side of the table",
      "relative_size": "Large compared to objects",
      "orientation": "Vertical, angled towards the center",
      "pose": "Stationary base, moving upper arm",
      "action": "Reaching and placing objects",
      "state_changes": "End-effector moves from top right to center and back up"
    },
    {
      "description": "Grey earbuds",
      "appearance_details": "Small, grey, plastic casing",
      "relationship": "Object being manipulated",
      "location": "Initially near the wall, finally on the orange book",
      "relative_size": "Small",
      "orientation": "Lying flat, then placed upright",
      "pose": "Static, then moved",
      "action": "Being picked up and placed",
      "state_changes": "Moved from table surface to on top of the book"
    },
    {
      "description": "Orange book",
      "appearance_details": "Orange cover with colorful abstract design",
      "relationship": "Target surface",
      "location": "Center-right of the table",
      "relative_size": "Medium",
      "orientation": "Flat on the table",
      "pose": "Static",
      "action": "Receiving an object",
      "state_changes": "Earbuds placed on top"
    }
  ],
  "background_setting": "An indoor setting with a light-colored wooden table surface. A white wall is visible in the background with a dark curtain or fabric hanging on the left side.",
  "lighting": {
    "conditions": "Indoor artificial lighting",
    "direction": "Overhead, casting soft shadows",
    "shadows": "Soft shadows cast by the robot arm and objects on the table",
    "illumination_effect": "Even illumination with slight highlights on the robot's white surface"
  },
  "aesthetics": {
    "composition": "Static shot with objects arranged in a semi-circle",
    "color_scheme": "Neutral wood tones, white robot, colorful accents (orange book, blue mug)",
    "mood_atmosphere": "Clinical, functional, simulation-like",
    "patterns": "Wood grain texture on the table"
  },
  "cinematography": {
    "framing": "Medium shot, slightly high angle",
    "camera_angle": "High angle looking down at the table",
    "depth_of_field": "Deep focus, all objects are sharp",
    "focus": "Fixed on the table surface and objects",
    "lens_focal_length": "Unknown, likely standard or slightly wide",
    "camera_motion": "Static camera"
  },
  "style_medium": "robotics video",
  "artistic_style": "synthetic robot simulation",
  "context": "A robotic manipulation task where the robot is instructed to move a specific object (earbuds) from one location to another (onto a book).",
  "actions": [
    {
      "time": "0.0s - 2.0s",
      "description": "Robot arm is stationary, gripper is out of frame or idle.",
      "actor": "Robot arm",
      "object": null,
      "contact_event": null,
      "state_change": null,
      "evidence": "No movement observed"
    },
    {
      "time": "2.0s - 5.0s",
      "description": "Robot gripper descends towards the grey earbuds.",
      "actor": "Robot arm",
      "object": "Grey earbuds",
      "contact_event": "Gripper approaches earbuds",
      "state_change": null,
      "evidence": "Gripper moves down and aligns with earbuds"
    },
    {
      "time": "5.0s - 7.0s",
      "description": "Robot gripper grasps the earbuds and lifts them.",
      "actor": "Robot arm",
      "object": "Grey earbuds",
      "contact_event": "Grasp established",
      "state_change": "Earbuds lifted off table",
      "evidence": "Earbuds move upwards with the gripper"
    },
    {
      "time": "7.0s - 10.0s",
      "description": "Robot moves the earbuds towards the orange book.",
      "actor": "Robot arm",
      "object": "Grey earbuds",
      "contact_event": null,
      "state_change": null,
      "evidence": "Earbuds translate horizontally towards the book"
    },
    {
      "time": "10.0s - 12.0s",
      "description": "Robot places the earbuds on the orange book.",
      "actor": "Robot arm",
      "object": "Grey earbuds",
      "contact_event": "Earbuds placed on book",
      "state_change": "Earbuds now on top of the book",
      "evidence": "Earbuds settle on the book cover"
    },
    {
      "time": "12.0s - 12.967s",
      "description": "Robot arm retracts upwards.",
      "actor": "Robot arm",
      "object": null,
      "contact_event": null,
      "state_change": null,
      "evidence": "Gripper moves away from the book"
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "0.0s - 2.0s",
      "description": "Initial state with robot arm stationary.",
      "key_changes": "None",
      "camera": "Static",
      "robot_motion": "None",
      "object_motion": "None",
      "physical_logic": "Preparation phase"
    },
    {
      "segment_index": 2,
      "time_range": "2.0s - 5.0s",
      "description": "Robot approaches the earbuds.",
      "key_changes": "Gripper enters frame and moves towards earbuds",
      "camera": "Static",
      "robot_motion": "Descending",
      "object_motion": "None",
      "physical_logic": "Approach trajectory"
    },
    {
      "segment_index": 3,
      "time_range": "5.0s - 7.0s",
      "description": "Robot grasps and lifts the earbuds.",
      "key_changes": "Earbuds are lifted from the table",
      "camera": "Static",
      "robot_motion": "Lifting",
      "object_motion": "Vertical translation",
      "physical_logic": "Grasp and lift"
    },
    {
      "segment_index": 4,
      "time_range": "7.0s - 10.0s",
      "description": "Robot transports the earbuds to the book.",
      "key_changes": "Earbuds move horizontally towards the book",
      "camera": "Static",
      "robot_motion": "Horizontal translation",
      "object_motion": "Horizontal translation",
      "physical_logic": "Transport"
    },
    {
      "segment_index": 5,
      "time_range": "10.0s - 12.0s",
      "description": "Robot places the earbuds on the book.",
      "key_changes": "Earbuds are placed on the book cover",
      "camera": "Static",
      "robot_motion": "Placing",
      "object_motion": "Vertical placement",
      "physical_logic": "Release"
    },
    {
      "segment_index": 6,
      "time_range": "12.0s - 12.967s",
      "description": "Robot retracts.",
      "key_changes": "Gripper moves away",
      "camera": "Static",
      "robot_motion": "Retracting",
      "object_motion": "None",
      "physical_logic": "Task completion"
    }
  ],
  "transitions": [
    "Robot enters frame from top right",
    "Gripper makes contact with earbuds",
    "Earbuds are lifted and moved",
    "Earbuds are placed on the book",
    "Robot retracts"
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "12.967s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "White articulated arm with black cables",
    "end_effector": "Gripper with black fingers",
    "manipulated_objects": [
      "Grey earbuds"
    ],
    "task_goal": "Move earbuds from table to book",
    "initial_state": "Earbuds on table, robot idle",
    "terminal_state": "Earbuds on book, robot idle",
    "affordances": "Gripper suitable for small objects",
    "failure_modes": "None observed"
  },
  "physical_reasoning": {
    "contact_and_grasp": "Gripper successfully grasps the earbuds",
    "support_and_stability": "Earbuds are stable on the book",
    "force_gravity_friction": "Gravity acts on the earbuds, friction keeps them on the book",
    "causal_chain": "Robot moves -> Grasp -> Lift -> Move -> Place -> Release",
    "object_permanence": "Earbuds remain on the book after placement",
    "next_action_prediction": "Robot may move to another object or exit the scene"
  },
  "uncertainties": [
    "Exact grip force applied is unknown",
    "Future actions of the robot are unknown"
  ],
  "comprehensive_t2v_caption": "A white robotic arm with a green illuminated base sits on a light wooden table cluttered with various objects including a blue mug, a black sneaker, a spoon, a woven bracelet, an orange book, a black block, and grey earbuds. The robot's gripper descends from the top right, grasps the grey earbuds, lifts them, and moves them horizontally towards the orange book. The robot then places the earbuds on top of the book cover and retracts its arm upwards, completing the task."
}

Qwen 3.5 27B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 27B · 159.6s · 2026-06-10T18:08:07+0000

Short Read

At the beginning, the robotic arm is stationary. In the early phase, the arm begins to move, extending its joints. In the middle, the gripper descends towards the small grey object near the tape measure. Towards the end, the gripper makes contact with the object and lifts it slightly off the table surface.

Training Caption Field

A white robotic arm with a glowing green base light is positioned on a light wooden table cluttered with various items including a sneaker, a blue mug, an orange book, and a tape measure. The camera maintains a static high-angle shot. Initially, the robot is idle. It then articulates its joints, extending its black and grey gripper towards a small, grey, irregular object located near the back of the table. The gripper descends, makes contact with the object, and successfully grasps it. The arm then lifts the object slightly off the table surface, demonstrating a successful pick-and-place operation in a simulated environment.

Full output JSON/text
{
  "scene_imagination": "A white robotic arm with a black and grey gripper is positioned on the left side of a light wooden table. Various objects are scattered on the table: a black sneaker with green accents, a blue mug, a yellow tennis ball, a woven ring, an orange book with a cartoon cover, a metal spoon, a black rectangular block, a small grey object, and a tape measure in the background. The robot's base has a glowing green light. The background consists of a white wall and a dark curtain to the left.",
  "temporal_caption": "At the beginning, the robotic arm is stationary. In the early phase, the arm begins to move, extending its joints. In the middle, the gripper descends towards the small grey object near the tape measure. Towards the end, the gripper makes contact with the object and lifts it slightly off the table surface.",
  "audio_description": null,
  "subjects": [
    {
      "description": "A white robotic arm with a multi-jointed structure and a black/grey parallel-jaw gripper.",
      "appearance_details": "White matte finish, black cables at the base, green status light.",
      "relationship": "The active agent in the scene.",
      "location": "Left side of the frame, base on the table.",
      "relative_size": "Large, dominating the left foreground.",
      "orientation": "Facing towards the center-right of the table.",
      "pose": "Initially static, then articulating to reach for an object.",
      "action": "Reaching and grasping.",
      "state_changes": "Transitions from idle to active manipulation."
    },
    {
      "description": "A small, grey, irregularly shaped object.",
      "appearance_details": "Small, matte grey.",
      "relationship": "The target object for the robot.",
      "location": "Near the back right of the table, close to the tape measure.",
      "relative_size": "Small.",
      "orientation": "Resting on the table.",
      "pose": "Static until grasped.",
      "action": "Being grasped.",
      "state_changes": "Lifted from the table surface."
    },
    {
      "description": "Collection of miscellaneous objects on the table.",
      "appearance_details": "Includes a sneaker, mug, book, tennis ball, etc.",
      "relationship": "Static environment objects.",
      "location": "Scattered across the center and right of the table.",
      "relative_size": "Varied.",
      "orientation": "Various.",
      "pose": "Static.",
      "action": "None.",
      "state_changes": "None."
    }
  ],
  "background_setting": "An indoor setting, likely a laboratory or office, featuring a light-colored wooden table against a plain white wall. A dark curtain is visible on the far left.",
  "lighting": {
    "conditions": "Artificial indoor lighting.",
    "direction": "Overhead and slightly from the front.",
    "shadows": "Soft shadows cast by the robot and objects onto the table.",
    "illumination_effect": "Even illumination with no harsh contrasts."
  },
  "aesthetics": {
    "composition": "The robot is placed on the left, creating a diagonal line of action towards the objects on the right.",
    "color_scheme": "Neutral tones (white, wood, grey) with pops of color from the objects (orange book, blue mug, green shoe accents).",
    "mood_atmosphere": "Clinical, functional, and focused.",
    "patterns": "Wood grain on the table."
  },
  "cinematography": {
    "framing": "Medium shot capturing the robot base and the array of objects.",
    "camera_angle": "High angle, looking down at the table.",
    "depth_of_field": "Deep focus, keeping both the robot and background objects relatively sharp.",
    "focus": "On the robot arm and the table surface.",
    "lens_focal_length": "Unknown.",
    "camera_motion": "Static camera."
  },
  "style_medium": "robotics video",
  "artistic_style": "synthetic robot simulation",
  "context": "The video depicts a simulated robotic manipulation task where a robot arm is programmed to identify and grasp a specific object from a cluttered table.",
  "actions": [
    {
      "time": "00:00 - 00:02",
      "description": "The robotic arm remains stationary.",
      "actor": "Robotic arm",
      "object": "None",
      "contact_event": "None",
      "state_change": "None",
      "evidence": "No movement observed."
    },
    {
      "time": "00:02 - 00:05",
      "description": "The robotic arm articulates, moving its gripper towards the small grey object.",
      "actor": "Robotic arm",
      "object": "Small grey object",
      "contact_event": "Approaching",
      "state_change": "Arm position changes.",
      "evidence": "Visible joint rotation and gripper translation."
    },
    {
      "time": "00:05 - 00:12",
      "description": "The gripper closes around the small grey object and lifts it.",
      "actor": "Robotic arm",
      "object": "Small grey object",
      "contact_event": "Grasp and lift",
      "state_change": "Object is lifted off the table.",
      "evidence": "Gripper jaws close, object moves upward with the gripper."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:02",
      "description": "Static shot of the robot and objects.",
      "key_changes": "None.",
      "camera": "Static.",
      "robot_motion": "None.",
      "object_motion": "None.",
      "physical_logic": "Initial state."
    },
    {
      "segment_index": 2,
      "time_range": "00:02 - 00:05",
      "description": "Robot arm moves towards the target object.",
      "key_changes": "Robot arm articulation.",
      "camera": "Static.",
      "robot_motion": "Reaching.",
      "object_motion": "None.",
      "physical_logic": "Path planning execution."
    },
    {
      "segment_index": 3,
      "time_range": "00:05 - 00:12",
      "description": "Robot grasps and lifts the object.",
      "key_changes": "Contact and lift.",
      "camera": "Static.",
      "robot_motion": "Grasping and lifting.",
      "object_motion": "Lifted.",
      "physical_logic": "Successful grasp and manipulation."
    }
  ],
  "transitions": [
    "Static to motion",
    "Approach to contact",
    "Contact to lift"
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16:9",
  "duration": "12.967s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "White articulated robotic arm with a parallel-jaw gripper.",
    "end_effector": "Black and grey parallel-jaw gripper.",
    "manipulated_objects": "Small grey object.",
    "task_goal": "Grasp and lift the small grey object.",
    "initial_state": "Robot idle, objects on table.",
    "terminal_state": "Robot holding the small grey object in the air.",
    "affordances": "Grasping, lifting, moving.",
    "failure_modes": "None observed."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper successfully closes around the object, indicating a stable grasp.",
    "support_and_stability": "The object is supported by the gripper after being lifted from the table.",
    "force_gravity_friction": "The robot applies sufficient force to overcome gravity and lift the object.",
    "causal_chain": "Arm movement leads to gripper positioning, which leads to object contact and subsequent lifting.",
    "object_permanence": "Objects remain in place unless acted upon by the robot.",
    "next_action_prediction": "The robot may move the object to a new location or place it elsewhere."
  },
  "uncertainties": [
    "The exact nature of the small grey object is unclear.",
    "The specific programming or control method of the robot is not visible."
  ],
  "comprehensive_t2v_caption": "A white robotic arm with a glowing green base light is positioned on a light wooden table cluttered with various items including a sneaker, a blue mug, an orange book, and a tape measure. The camera maintains a static high-angle shot. Initially, the robot is idle. It then articulates its joints, extending its black and grey gripper towards a small, grey, irregular object located near the back of the table. The gripper descends, makes contact with the object, and successfully grasps it. The arm then lifts the object slightly off the table surface, demonstrating a successful pick-and-place operation in a simulated environment."
}

Cosmos 3 Nano (16B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Nano (16B) · 20.2s · 2026-06-10T18:36:24+0000

Short Read

The video begins with the robotic arm stationary, its end-effector hovering above the table. The arm then moves its end-effector towards the small gray object near the wall, grasps it, and lifts it slightly. The arm then moves the object over the orange book and releases it, placing it on top of the book. The arm returns to its initial position, and the video ends.

Training Caption Field

A robotic arm with a white body and green indicator light is positioned on a light wooden table. The table holds various objects: a black and green sneaker, a purple mug, a small yellow object, a woven bracelet, an orange book with colorful illustrations, a black rectangular object, and a small gray object near the wall. The robotic arm moves its end-effector, which has a black and silver tool attached, to interact with the small gray object. The arm moves its end-effector towards the small gray object near the wall, grasps it, lifts it, moves it over the orange book, and releases it on top of the book. The arm then returns to its initial position. The background features a plain white wall and a dark curtain on the left. The lighting is even and artificial, typical of an indoor workspace or lab setting.

Full output JSON/text
{
  "scene_imagination": "A robotic arm with a white body and green indicator light is positioned on a light wooden table. The table holds various objects: a black and green sneaker, a purple mug, a small yellow object, a woven bracelet, an orange book with colorful illustrations, a black rectangular object, and a small gray object near the wall. The robot arm moves its end-effector, which has a black and silver tool attached, to interact with the small gray object. The background features a plain white wall and a dark curtain on the left. The lighting is even and artificial, typical of an indoor workspace or lab setting.",
  "temporal_caption": "The video begins with the robotic arm stationary, its end-effector hovering above the table. The arm then moves its end-effector towards the small gray object near the wall, grasps it, and lifts it slightly. The arm then moves the object over the orange book and releases it, placing it on top of the book. The arm returns to its initial position, and the video ends.",
  "audio_description": "The video is silent, with no audible sounds or music.",
  "subjects": [
    {
      "description": "A robotic arm with a white body and a green indicator light on its base. It has a black cable extending from its base and a black end-effector with a silver tool attached.",
      "appearance_details": "White body with green indicator light, black cable, black end-effector with silver tool.",
      "relationship": "Interacting with objects on the table.",
      "location": "Left side of the frame, extending towards the center.",
      "relative_size": "Large within frame.",
      "orientation": "Facing the table.",
      "pose": "Extended towards the table.",
      "action": "Moving its end-effector to interact with objects on the table.",
      "state_changes": "Moves its end-effector towards the small gray object, grasps it, lifts it, moves it over the orange book, and releases it."
    },
    {
      "description": "A black and green sneaker.",
      "appearance_details": "Black upper with green accents and a white sole.",
      "relationship": "Resting on the table.",
      "location": "Center-left of the frame.",
      "relative_size": "Small within frame.",
      "orientation": "Facing forward.",
      "pose": "Resting on the table.",
      "action": "Stationary.",
      "state_changes": "No significant change."
    },
    {
      "description": "A purple mug.",
      "appearance_details": "Purple body with a handle.",
      "relationship": "Resting on the table.",
      "location": "Center of the frame.",
      "relative_size": "Small within frame.",
      "orientation": "Facing forward.",
      "pose": "Resting on the table.",
      "action": "Stationary.",
      "state_changes": "No significant change."
    },
    {
      "description": "A small yellow object.",
      "appearance_details": "Yellow, possibly a toy or tool.",
      "relationship": "Resting on the table.",
      "location": "Center of the frame, near the mug.",
      "relative_size": "Very small within frame.",
      "orientation": "Facing forward.",
      "pose": "Resting on the table.",
      "action": "Stationary.",
      "state_changes": "No significant change."
    },
    {
      "description": "A woven bracelet.",
      "appearance_details": "Brown, circular, woven texture.",
      "relationship": "Resting on the table.",
      "location": "Center-right of the frame.",
      "relative_size": "Small within frame.",
      "orientation": "Facing forward.",
      "pose": "Resting on the table.",
      "action": "Stationary.",
      "state_changes": "No significant change."
    },
    {
      "description": "An orange book with colorful illustrations.",
      "appearance_details": "Orange cover with colorful illustrations and text.",
      "relationship": "Resting on the table.",
      "location": "Right side of the frame.",
      "relative_size": "Medium within frame.",
      "orientation": "Facing forward.",
      "pose": "Resting on the table.",
      "action": "Stationary.",
      "state_changes": "A small gray object is placed on top of it."
    },
    {
      "description": "A black rectangular object.",
      "appearance_details": "Black, rectangular, possibly a phone or remote.",
      "relationship": "Resting on the table.",
      "location": "Far right of the frame.",
      "relative_size": "Small within frame.",
      "orientation": "Facing forward.",
      "pose": "Resting on the table.",
      "action": "Stationary.",
      "state_changes": "No significant change."
    },
    {
      "description": "A small gray object.",
      "appearance_details": "Gray, possibly a toy or tool.",
      "relationship": "Initially resting on the table, then moved by the robotic arm.",
      "location": "Near the wall, initially, then moved to the orange book.",
      "relative_size": "Very small within frame.",
      "orientation": "Facing forward.",
      "pose": "Resting on the table, then lifted and placed on the orange book.",
      "action": "Moved by the robotic arm.",
      "state_changes": "Grasped by the robotic arm, lifted, moved, and released on the orange book."
    }
  ],
  "background_setting": "An indoor setting with a light wooden table and a plain white wall in the background. A dark curtain is visible on the left side of the frame.",
  "lighting": {
    "conditions": "Even, artificial indoor lighting.",
    "direction": "Top-down lighting.",
    "shadows": "Soft shadows cast by objects on the table.",
    "illumination_effect": "Even illumination across the scene."
  },
  "aesthetics": {
    "composition": "Centered composition with the robotic arm on the left and objects arranged on the table.",
    "color_scheme": "Neutral tones with pops of color from the objects on the table.",
    "mood_atmosphere": "Functional and focused.",
    "patterns": "Wood grain pattern on the table."
  },
  "cinematography": {
    "camera_angle": "High angle, looking down at the table.",
    "framing": "Wide shot capturing the entire table and the robotic arm.",
    "depth_of_field": "Deep depth of field, with all objects in focus.",
    "focus": "Sharp focus on the robotic arm and objects on the table.",
    "lens_focal_length": "Wide-angle lens.",
    "camera_motion": "Static camera."
  },
  "style_medium": "3D animation",
  "artistic_style": "Realistic 3D rendering",
  "context": "A robotic arm is shown manipulating objects on a table in a simulated environment.",
  "actions": [
    {
      "time": "0.0s-1.0s",
      "description": "The robotic arm is stationary, with its end-effector hovering above the table.",
      "actor": "Robotic arm",
      "object": "Robotic arm",
      "contact_event": "None",
      "state_change": "No significant change."
    },
    {
      "time": "1.0s-4.0s",
      "description": "The robotic arm moves its end-effector towards the small gray object near the wall.",
      "actor": "Robotic arm",
      "object": "Small gray object",
      "contact_event": "The end-effector makes contact with the small gray object.",
      "state_change": "The small gray object is grasped by the end-effector."
    },
    {
      "time": "4.0s-7.0s",
      "description": "The robotic arm lifts the small gray object and moves it over the orange book.",
      "actor": "Robotic arm",
      "object": "Small gray object",
      "contact_event": "The end-effector lifts the small gray object.",
      "state_change": "The small gray object is lifted and moved."
    },
    {
      "time": "7.0s-10.0s",
      "description": "The robotic arm releases the small gray object on top of the orange book.",
      "actor": "Robotic arm",
      "object": "Small gray object",
      "contact_event": "The end-effector releases the small gray object.",
      "state_change": "The small gray object is placed on top of the orange book."
    },
    {
      "time": "10.0s-12.9s",
      "description": "The robotic arm returns to its initial position.",
      "actor": "Robotic arm",
      "object": "Robotic arm",
      "contact_event": "None",
      "state_change": "No significant change."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Unreadable text on the orange book cover.",
      "category": "physical_in_scene",
      "appearance": "Colorful illustrations and text on the book cover.",
      "spatial": "Right side of the frame, on the orange book.",
      "spatial_temporal": "Static throughout the video.",
      "context": "Text on the book cover."
    }
  ],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0.0s-1.0s",
      "description": "The robotic arm is stationary, with its end-effector hovering above the table.",
      "key_changes": "None",
      "camera": "Static camera.",
      "robot_motion": "Static robotic arm.",
      "object_motion": "None",
      "physical_reasoning": "None"
    },
    {
      "segment_index": 1,
      "time_range": "1.0s-4.0s",
      "description": "The robotic arm moves its end-effector towards the small gray object near the wall.",
      "key_changes": "The robotic arm moves towards the small gray object.",
      "camera": "Static camera.",
      "robot_motion": "Moving robotic arm.",
      "object_motion": "None",
      "physical_reasoning": "The robotic arm moves towards the small gray object."
    },
    {
      "segment_index": 2,
      "time_range": "4.0s-7.0s",
      "description": "The robotic arm lifts the small gray object and moves it over the orange book.",
      "key_changes": "The robotic arm lifts and moves the small gray object.",
      "camera": "Static camera.",
      "robot_motion": "Moving robotic arm.",
      "object_motion": "The small gray object is lifted and moved.",
      "physical_reasoning": "The robotic arm lifts and moves the small gray object."
    },
    {
      "segment_index": 3,
      "time_range": "7.0s-10.0s",
      "description": "The robotic arm releases the small gray object on top of the orange book.",
      "key_changes": "The robotic arm releases the small gray object.",
      "camera": "Static camera.",
      "robot_motion": "Moving robotic arm.",
      "object_motion": "The small gray object is placed on top of the orange book.",
      "physical_reasoning": "The robotic arm releases the small gray object."
    },
    {
      "segment_index": 4,
      "time_range": "10.0s-12.9s",
      "description": "The robotic arm returns to its initial position.",
      "key_changes": "The robotic arm returns to its initial position.",
      "camera": "Static camera.",
      "robot_motion": "Moving robotic arm.",
      "object_motion": "None",
      "physical_reasoning": "The robotic arm returns to its initial position."
    }
  ],
  "transitions": [
    "None"
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "12.967s",
  "fps": 30,
  "robotics": {
    "embodiment": "Robotic arm with a white body and green indicator light.",
    "end_effector": "Black end-effector with a silver tool attached.",
    "manipulated_objects": [
      "Small gray object"
    ],
    "task_goal": "Move the small gray object from near the wall to on top of the orange book.",
    "initial_state": "The robotic arm is stationary, with its end-effector hovering above the table.",
    "terminal_state": "The robotic arm has moved the small gray object to on top of the orange book and returned to its initial position.",
    "affordances": "The robotic arm has a gripper that can grasp and manipulate objects.",
    "failure_modes": "The robotic arm may fail to grasp the small gray object or may drop it."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robotic arm's end-effector makes contact with the small gray object and grasps it.",
    "support_and_stability": "The objects on the table are supported by the table surface.",
    "force_and_motion": "The robotic arm applies force to lift and move the small gray object.",
    "causal_chain": "The robotic arm moves towards the small gray object, grasps it, lifts it, moves it over the orange book, and releases it.",
    "object_permutation": "The small gray object is moved from near the wall to on top of the orange book.",
    "next_action_prediction": "The robotic arm will likely return to its initial position."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A robotic arm with a white body and green indicator light is positioned on a light wooden table. The table holds various objects: a black and green sneaker, a purple mug, a small yellow object, a woven bracelet, an orange book with colorful illustrations, a black rectangular object, and a small gray object near the wall. The robotic arm moves its end-effector, which has a black and silver tool attached, to interact with the small gray object. The arm moves its end-effector towards the small gray object near the wall, grasps it, lifts it, moves it over the orange book, and releases it on top of the book. The arm then returns to its initial position. The background features a plain white wall and a dark curtain on the left. The lighting is even and artificial, typical of an indoor workspace or lab setting."
}

Cosmos 3 Super (64B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Super (64B) · 17.7s · 2026-06-10T16:12:03+0000

Short Read

The robotic arm extends its gripper towards the orange book on the table, makes contact, and appears to manipulate or lift it slightly.

Training Caption Field

A 3D animated video shows a robotic arm with a green illuminated base extending its gripper towards an orange book on a light wooden table. The table also holds a black and green sneaker, a purple mug, a silver spoon, a yellow banana, a woven ring, a black rectangular block, and a small black object in the background. The robotic arm moves from a resting position to an extended position, making contact with the orange book and slightly lifting or moving it.

Full output JSON/text
{
  "scene_imagination": "A robotic arm with a green illuminated base is positioned on a light wooden table. Various objects are scattered across the table: a black and green sneaker, a purple mug, a silver spoon, a yellow banana, an orange book with a cartoon cover, a woven ring, a black rectangular block, and a small black object in the background. The robot arm moves its gripper towards the orange book, interacting with it.",
  "temporal_caption": "The robotic arm extends its gripper towards the orange book on the table, makes contact, and appears to manipulate or lift it slightly.",
  "audio_description": "No significant audio elements are present in the video.",
  "subjects": [
    {
      "description": "A robotic arm with a white, cylindrical body and a green illuminated base. It has a black cable extending from its base.",
      "appearance_details": "White body, green illuminated base, black cable.",
      "relationship": "Interacting with objects on the table.",
      "location": "Left side of the frame.",
      "relative_size": "Large",
      "orientation": "Facing the center of the table.",
      "pose": "Extended towards the center of the table.",
      "action": "Moving its gripper towards the orange book.",
      "state_changes": "The arm moves from a resting position to an extended position."
    },
    {
      "description": "A black and green sneaker.",
      "appearance_details": "Black upper, green accents, white sole.",
      "relationship": "Resting on the table.",
      "location": "Center-left of the table.",
      "relative_size": "Medium",
      "orientation": "Facing upwards.",
      "pose": "Resting on the table.",
      "action": "None",
      "state_changes": "No significant change."
    },
    {
      "description": "A purple mug.",
      "appearance_details": "Purple exterior, white interior.",
      "relationship": "Resting on the table.",
      "location": "Center of the table.",
      "relative_size": "Small",
      "orientation": "Upright.",
      "pose": "Resting on the table.",
      "action": "None",
      "state_changes": "No significant change."
    },
    {
      "description": "A silver spoon.",
      "appearance_details": "Silver metal, curved handle.",
      "relationship": "Resting on the table.",
      "location": "Center-right of the table.",
      "relative_size": "Small",
      "orientation": "Lying flat.",
      "pose": "Resting on the table.",
      "action": "None",
      "state_changes": "No significant change."
    },
    {
      "description": "A yellow banana.",
      "appearance_details": "Yellow peel, curved shape.",
      "relationship": "Resting on the table.",
      "location": "Center-right of the table.",
      "relative_size": "Small",
      "orientation": "Lying flat.",
      "pose": "Resting on the table.",
      "action": "None",
      "state_changes": "No significant change."
    },
    {
      "description": "An orange book with a cartoon cover.",
      "appearance_details": "Orange cover, cartoon illustration.",
      "relationship": "Resting on the table.",
      "location": "Center-right of the table.",
      "relative_size": "Medium",
      "orientation": "Lying flat.",
      "pose": "Resting on the table.",
      "action": "Being manipulated by the robotic arm.",
      "state_changes": "The book is slightly lifted or moved by the robotic arm."
    },
    {
      "description": "A woven ring.",
      "appearance_details": "Brown woven material.",
      "relationship": "Resting on the table.",
      "location": "Center-right of the table.",
      "relative_size": "Small",
      "orientation": "Lying flat.",
      "pose": "Resting on the table.",
      "action": "None",
      "state_changes": "No significant change."
    },
    {
      "description": "A black rectangular block.",
      "appearance_details": "Solid black, rectangular shape.",
      "relationship": "Resting on the table.",
      "location": "Right side of the table.",
      "relative_size": "Small",
      "orientation": "Lying flat.",
      "pose": "Resting on the table.",
      "action": "None",
      "state_changes": "No significant change."
    },
    {
      "description": "A small black object.",
      "appearance_details": "Small, black, possibly a toy or tool.",
      "relationship": "Resting on the table.",
      "location": "Top-right corner of the table.",
      "relative_size": "Small",
      "orientation": "Lying flat.",
      "pose": "Resting on the table.",
      "action": "None",
      "state_changes": "No significant change."
    }
  ],
  "background_setting": "A light wooden table with various objects scattered across it. The background is a plain white wall.",
  "lighting": {
    "conditions": "Bright indoor lighting.",
    "direction": "Top-lit.",
    "shadows": "Soft shadows cast by objects on the table.",
    "illumination_effect": "Even illumination across the scene."
  },
  "aesthetics": {
    "composition": "Centered on the robotic arm and the objects on the table.",
    "color_scheme": "Warm tones with pops of color from the objects.",
    "mood_atmosphere": "Neutral, focused on the robotic interaction.",
    "patterns": "Wood grain pattern on the table."
  },
  "cinematography": {
    "camera_motion": "Static camera.",
    "framing": "Wide shot.",
    "camera_angle": "High angle.",
    "depth_of_field": "Deep focus.",
    "focus": "Sharp focus on the robotic arm and objects.",
    "lens_focal_length": "Standard lens."
  },
  "style_medium": "3D animation.",
  "artistic_style": "Realistic 3D rendering.",
  "context": "A demonstration of a robotic arm interacting with objects on a table.",
  "actions": [
    {
      "time": "0-6s",
      "description": "The robotic arm extends its gripper towards the orange book on the table.",
      "actor": "Robotic arm",
      "object": "Orange book",
      "contact_event": "The gripper makes contact with the book.",
      "state_change": "The book is slightly lifted or moved."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0-6s",
      "description": "The robotic arm extends its gripper towards the orange book on the table.",
      "key_changes": "The robotic arm moves from a resting position to an extended position.",
      "camera": "Static camera.",
      "robot_motion": "The robotic arm extends its gripper.",
      "object_motion": "The orange book is slightly lifted or moved.",
      "physical_logic": "The robotic arm applies force to the book."
    }
  ],
  "transitions": [],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16,9",
  "duration": "6s",
  "fps": 30,
  "robotics": {
    "embodiment": "A robotic arm with a white, cylindrical body and a green illuminated base.",
    "end_effector": "A gripper at the end of the robotic arm.",
    "manipulated_objects": [
      "Orange book"
    ],
    "task_goal": "To manipulate the orange book on the table.",
    "initial_state": "The robotic arm is resting on the table.",
    "terminal_state": "The robotic arm has extended its gripper towards the orange book.",
    "affordances": "The robotic arm can extend its gripper and manipulate objects.",
    "failure_modes": "The robotic arm may not be able to grasp the book properly."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robotic arm makes contact with the orange book and appears to grasp it.",
    "support_and_stability": "The robotic arm is supported by its base on the table.",
    "force_gravity_friction": "The robotic arm applies force to the book, overcoming gravity and friction.",
    "causal_chain": "The robotic arm extends its gripper, makes contact with the book, and applies force to manipulate it.",
    "object_permanence": "The objects on the table remain in their positions except for the orange book, which is slightly moved.",
    "next_action_prediction": "The robotic arm will likely continue to manipulate the orange book or move to another object."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A 3D animated video shows a robotic arm with a green illuminated base extending its gripper towards an orange book on a light wooden table. The table also holds a black and green sneaker, a purple mug, a silver spoon, a yellow banana, a woven ring, a black rectangular block, and a small black object in the background. The robotic arm moves from a resting position to an extended position, making contact with the orange book and slightly lifting or moving it."
}

Cosmos 3 Super (64B) / p09: Reproduction schema

ok
p09: Reproduction schema · Cosmos 3 Super (64B) · 23.9s · 2026-06-10T18:54:16+0000

Short Read

A robotic arm moves a small gray object from the table onto an orange book.

Training Caption Field

A robotic arm with a white body and black cables moves from a raised position to grasp a small gray object on a light wood table. The arm then lifts the object and places it on top of an orange book with a colorful illustration. Other objects on the table, including a black mug, a purple cup, a black and green object, a circular wooden ring, and a black rectangular block, remain stationary throughout the sequence. The camera remains static, providing a high-angle side view of the entire workspace.

Full output JSON/text
{
  "schema_version": "p09_reproduction_v1",
  "one_sentence_summary": "A robotic arm moves a small gray object from the table onto an orange book.",
  "video_metadata": {
    "resolution": {
      "W": 1152,
      "H": 816
    },
    "aspect_ratio": "16,9",
    "duration_sec": 12.967,
    "fps": 30.0,
    "frame_count": 389
  },
  "camera": {
    "viewpoint": "high-angle side view",
    "framing": "medium shot",
    "camera_angle": "high angle",
    "camera_motion": "static",
    "estimated_height_or_distance": "1.5 meters above the table",
    "lens_or_distortion": "standard lens, no visible distortion",
    "crop_boundaries": "full table visible within frame",
    "stability": "static"
  },
  "scene_layout": {
    "coordinate_frame": "table surface as primary plane, robot base at left, objects clustered center-right",
    "background": "plain white wall behind table, dark curtain visible at far left edge",
    "surface": "light wood-grain table with visible grain lines",
    "lighting": "even indoor lighting, soft shadows beneath objects",
    "global_spatial_map": "robot base anchored at left foreground; orange book center-right; black mug, purple cup, black and green object, circular wooden ring, black rectangular block, and small gray object arranged around the book; small black and orange device in upper right corner",
    "occlusions": "robot arm partially occludes left side of table; objects near center overlap slightly"
  },
  "subjects": [
    {
      "id": "robot_arm",
      "category": "robotic manipulator",
      "appearance": "white articulated arm with black cables, green illuminated base, black gripper tool at end",
      "visible_parts": "base, shoulder joint, upper arm, forearm, wrist, gripper",
      "initial_pose": "arm extended upward, gripper open above table",
      "final_pose": "arm retracted slightly, gripper open after placing object",
      "workspace_location": "left side of table, extending toward center-right",
      "motion_path": "arm lowers, moves rightward, places object, retracts",
      "end_effector": "black multi-finger gripper",
      "uncertainties": "no significant uncertainties"
    }
  ],
  "objects": [
    {
      "id": "orange_book",
      "category": "book",
      "appearance": "orange cover with colorful illustration",
      "label_or_text": "illustrated cover, text illegible",
      "material": "paper",
      "shape": "rectangular flat",
      "size_relative_to_frame": "medium",
      "initial_pose": "lying flat center-right",
      "final_pose": "lying flat with small gray object on top",
      "state_changes": "receives small gray object on its cover"
    },
    {
      "id": "small_gray_object",
      "category": "small object",
      "appearance": "gray, irregular shape, possibly a toy or tool",
      "label_or_text": "none",
      "material": "plastic or rubber",
      "shape": "irregular",
      "size_relative_to_frame": "small",
      "initial_pose": "resting on table near upper right",
      "final_pose": "resting on top of orange book",
      "state_changes": "moved from table to book by robot arm"
    },
    {
      "id": "black_mug",
      "category": "mug",
      "appearance": "black ceramic with handle",
      "label_or_text": "none",
      "material": "ceramic",
      "shape": "cylindrical with handle",
      "size_relative_to_frame": "small",
      "initial_pose": "upright near center",
      "final_pose": "upright near center, unchanged"
    },
    {
      "id": "purple_cup",
      "category": "cup",
      "appearance": "purple translucent plastic",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "cylindrical",
      "size_relative_to_frame": "small",
      "initial_pose": "upright near center",
      "final_pose": "upright near center, unchanged"
    },
    {
      "id": "black_and_green_object",
      "category": "unknown object",
      "appearance": "black body with green accents, possibly a tool or container",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "irregular",
      "size_relative_to_frame": "small",
      "initial_pose": "resting on table near center",
      "final_pose": "resting on table near center, unchanged"
    },
    {
      "id": "circular_wooden_ring",
      "category": "ring",
      "appearance": "light brown wood, circular",
      "label_or_text": "none",
      "material": "wood",
      "shape": "circular hoop",
      "size_relative_to_frame": "small",
      "initial_pose": "lying flat near center",
      "final_pose": "lying flat near center, unchanged"
    },
    {
      "id": "black_rectangular_block",
      "category": "block",
      "appearance": "solid black, rectangular",
      "label_or_text": "none",
      "material": "plastic or rubber",
      "shape": "rectangular prism",
      "size_relative_to_frame": "small",
      "initial_pose": "resting on table right side",
      "final_pose": "resting on table right side, unchanged"
    },
    {
      "id": "small_black_and_orange_device",
      "category": "device",
      "appearance": "small black body with orange accents",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "irregular",
      "size_relative_to_frame": "small",
      "initial_pose": "resting on table upper right corner",
      "final_pose": "resting on table upper right corner, unchanged"
    }
  ],
  "initial_state": "Robot arm is raised above the table. All objects are stationary on the table surface. Small gray object is on the table near the upper right.",
  "final_state": "Robot arm has placed the small gray object on top of the orange book. All other objects remain in their original positions.",
  "timeline": [
    {
      "time_range": "0:00-0:03",
      "description": "Robot arm begins to lower and move toward the small gray object.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "small_gray_object": "stationary on table"
      },
      "robot_motion": "lowering and moving rightward",
      "object_motion": "none",
      "gripper_state": "open"
    },
    {
      "time_range": "0:03-0:06",
      "description": "Robot arm reaches the small gray object and closes its gripper around it.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "small_gray_object": "grasped by robot arm"
      },
      "robot_motion": "continues moving rightward, then closes gripper",
      "object_motion": "lifted off table",
      "gripper_state": "closed"
    },
    {
      "time_range": "0:06-0:09",
      "description": "Robot arm lifts the small gray object and moves it over the orange book.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "small_gray_object": "held above orange book"
      },
      "robot_motion": "lifting and moving rightward",
      "object_motion": "none",
      "gripper_state": "closed"
    },
    {
      "time_range": "0:09-0:12",
      "description": "Robot arm lowers the small gray object onto the orange book and releases it.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "small_gray_object": "resting on orange book"
      },
      "robot_motion": "lowering",
      "object_motion": "none",
      "gripper_state": "open"
    }
  ],
  "parallel_action_graph": {
    "simultaneous_actions": [],
    "dependencies": [
      {
        "action": "grasp small_gray_object",
        "depends_on": "move toward small_gray_object"
      },
      {
        "action": "place small_gray_object on orange book",
        "depends_on": "grasp small_gray_object"
      }
    ],
    "ordering_constraints": [
      "move toward small_gray_object",
      "grasp small_gray_object",
      "lift and move over orange book",
      "lower and release on orange book"
    ]
  },
  "keyframes": [
    {
      "time": "0:00",
      "frame_description": "Robot arm is raised, all objects stationary on table.",
      "subject_poses": {
        "robot_arm": "raised"
      },
      "object_poses": {
        "orange_book": "lying flat",
        "small_gray_object": "on table"
      },
      "contacts": []
    },
    {
      "time": "0:03",
      "frame_description": "Robot arm reaches the small gray object and grasps it.",
      "subject_poses": {
        "robot_arm": "lowered, gripper closed"
      },
      "object_poses": {
        "small_gray_object": "grasped by robot arm"
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "small_gray_object",
          "contact_type": "gripper grasp"
        }
      ]
    },
    {
      "time": "0:06",
      "frame_description": "Robot arm lifts the small gray object and moves it over the orange book.",
      "subject_poses": {
        "robot_arm": "lifted, moving rightward"
      },
      "object_poses": {
        "small_gray_object": "held above orange book"
      },
      "contacts": []
    },
    {
      "time": "0:09",
      "frame_description": "Robot arm lowers the small gray object onto the orange book.",
      "subject_poses": {
        "robot_arm": "lowered, gripper open"
      },
      "object_poses": {
        "small_gray_object": "resting on orange book"
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "small_gray_object",
          "contact_type": "release"
        }
      ]
    }
  ],
  "contact_events": [
    {
      "time": "0:03",
      "subject_id": "robot_arm",
      "object_id": "small_gray_object",
      "contact_type": "gripper grasp",
      "contact_location": "small gray object on table",
      "grasp_or_support_quality": "secure grasp",
      "release_or_transfer": "grasp"
    },
    {
      "time": "0:09",
      "subject_id": "robot_arm",
      "object_id": "small_gray_object",
      "contact_type": "release",
      "contact_location": "orange book",
      "grasp_or_support_quality": "release",
      "release_or_transfer": "release"
    }
  ],
  "physics_and_affordances": {
    "support_surfaces": "table surface supports all objects; orange book supports small gray object at end"
  },
  "appearance_fidelity": {
    "colors": "white robot arm, light wood table, orange book, black mug, purple cup, green accents, black and orange device",
    "textures": "wood grain on table, smooth plastic on robot, matte finish on objects",
    "lighting_shadows": "soft shadows beneath objects, even lighting",
    "robot_morphology": "white articulated arm with black cables and green illuminated base",
    "object_fine_details": "illustrated cover on orange book, irregular shape of small gray object"
  },
  "text_and_labels": [
    {
      "object_id": "orange_book",
      "raw_text": "illustrated cover, text illegible",
      "legibility": "illegible",
      "appearance": "colorful illustration on orange background",
      "spatial_location": "center-right",
      "time_visible": "entire video"
    }
  ],
  "audio_description": "mechanical whirring of robot arm, soft click when gripper closes, faint thud when object is placed on book",
  "generation_controls": {
    "shot_type": "medium shot",
    "camera_motion_instruction": "static",
    "motion_style": "smooth robotic motion",
    "duration_instruction": "12.967 seconds",
    "spatial_consistency_requirements": "objects remain stationary except for the small gray object",
    "negative_constraints": "no collisions, no object movement except small gray object",
    "recommended_caption_style": "descriptive, factual"
  },
  "uncertainties": [],
  "comprehensive_reproduction_caption": "A robotic arm with a white body and black cables moves from a raised position to grasp a small gray object on a light wood table. The arm then lifts the object and places it on top of an orange book with a colorful illustration. Other objects on the table, including a black mug, a purple cup, a black and green object, a circular wooden ring, and a black rectangular block, remain stationary throughout the sequence. The camera remains static, providing a high-angle side view of the entire workspace."
}

rovid_005

top

Qwen 3.5 9B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 9B · 59.6s · 2026-06-10T17:47:07+0000

Short Read

At the beginning, two robotic grippers are positioned near the bottom hem of a hanging green shirt. In the early phase, the right robotic arm moves its gripper towards the right side of the shirt's hem. By the middle, the arm lifts a white cylindrical object and positions it near the shirt. Towards the end, the arm lowers the white object and presses it firmly against the bottom center of the shirt, causing the fabric to bunch up.

Training Caption Field

A light green, short-sleeved button-up shirt hangs on a white mannequin against a grey wall. Two robotic arms with black grippers are positioned at the bottom of the frame. The right robotic arm moves towards the shirt, lifts a white cylindrical object, and then presses it firmly against the bottom center of the shirt, causing the fabric to bunch up and wrinkle. The left arm remains stationary throughout the sequence.

Full output JSON/text
{
  "scene_imagination": "A close-up view of a light green, short-sleeved button-up shirt hanging on a white mannequin torso against a plain grey wall. Two robotic arms with black grippers are positioned at the bottom of the frame, facing the shirt. The left arm remains stationary while the right arm moves to interact with the fabric.",
  "temporal_caption": "At the beginning, two robotic grippers are positioned near the bottom hem of a hanging green shirt. In the early phase, the right robotic arm moves its gripper towards the right side of the shirt's hem. By the middle, the arm lifts a white cylindrical object and positions it near the shirt. Towards the end, the arm lowers the white object and presses it firmly against the bottom center of the shirt, causing the fabric to bunch up.",
  "audio_description": null,
  "subjects": [
    {
      "description": "Green button-up shirt",
      "appearance_details": "Light green, short-sleeved, wrinkled fabric, white buttons, chest pocket",
      "relationship": "Hanging on a mannequin",
      "location": "Center of the frame",
      "relative_size": "Large, fills most of the vertical frame",
      "orientation": "Vertical, facing forward",
      "pose": "Hanging",
      "action": "Being manipulated by a robot",
      "state_changes": "Fabric bunches and wrinkles increase at the bottom due to pressure"
    },
    {
      "description": "White robotic arm",
      "appearance_details": "White articulated arm with black gripper",
      "relationship": "Interacting with the shirt",
      "location": "Right side of the frame",
      "relative_size": "Medium, partially visible",
      "orientation": "Angled towards the shirt",
      "pose": "Dynamic movement",
      "action": "Moving and pressing against the shirt",
      "state_changes": "Moves from right to center, lowers a white object"
    },
    {
      "description": "Black robotic gripper",
      "appearance_details": "Black, multi-fingered mechanical hand",
      "relationship": "Attached to the white robotic arm",
      "location": "Bottom right of the frame",
      "relative_size": "Small",
      "orientation": "Facing the shirt",
      "pose": "Open and closed",
      "action": "Holding and manipulating the white object",
      "state_changes": "Closes around the white object"
    }
  ],
  "background_setting": "A plain, light grey wall with a dark baseboard visible at the bottom. The environment appears to be an indoor laboratory or testing area.",
  "lighting": {
    "conditions": "Even, diffuse indoor lighting",
    "direction": "Frontal, slightly from above",
    "shadows": "Soft shadows cast by the shirt and robot onto the wall",
    "illumination_effect": "Clear visibility of fabric texture and robot details"
  },
  "aesthetics": {
    "composition": "Centered subject with robotic elements at the bottom",
    "color_scheme": "Muted tones: light green shirt, grey background, white robot, black gripper",
    "mood_atmosphere": "Clinical, technical, focused",
    "patterns": "Wrinkled fabric texture on the shirt"
  },
  "cinematography": {
    "framing": "Medium close-up",
    "camera_angle": "Eye-level with the shirt",
    "depth_of_field": "Deep focus, keeping both shirt and robot in sharp detail",
    "focus": "Sharp on the shirt and robot grippers",
    "lens_focal_length": "Standard or slightly wide angle",
    "camera_motion": "Static"
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "A demonstration of a robot learning to manipulate clothing, specifically folding or smoothing a shirt.",
  "actions": [
    {
      "time": "0:00 - 0:05",
      "description": "The right robotic arm moves its gripper towards the right side of the shirt's hem.",
      "actor": "Right robotic arm",
      "object": "Green shirt",
      "contact_event": "No contact",
      "state_change": "None",
      "evidence": "Movement of the arm"
    },
    {
      "time": "0:05 - 0:10",
      "description": "The right arm lifts a white cylindrical object and positions it near the shirt.",
      "actor": "Right robotic arm",
      "object": "White object",
      "contact_event": "None",
      "state_change": "Object is lifted",
      "evidence": "Visual tracking of the white object"
    },
    {
      "time": "0:10 - 0:15",
      "description": "The arm lowers the white object and presses it firmly against the bottom center of the shirt.",
      "actor": "Right robotic arm",
      "object": "Green shirt",
      "contact_event": "Firm contact and pressure applied",
      "state_change": "Fabric bunches and wrinkles increase at the bottom",
      "evidence": "Visible deformation of the shirt fabric"
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "0:00 - 0:05",
      "description": "Initial positioning of the right robotic arm near the shirt.",
      "key_changes": "Arm movement",
      "camera": "Static",
      "robot_motion": "Right arm moves right",
      "object_motion": "Shirt remains static",
      "physical_logic": "Approach phase"
    },
    {
      "segment_index": 2,
      "time_range": "0:05 - 0:10",
      "description": "The robot lifts a white object into the frame.",
      "key_changes": "Introduction of new object",
      "camera": "Static",
      "robot_motion": "Arm lifts object",
      "object_motion": "Object moves up",
      "physical_logic": "Object retrieval or positioning"
    },
    {
      "segment_index": 3,
      "time_range": "0:10 - 0:15",
      "description": "The robot presses the white object against the shirt.",
      "key_changes": "Fabric deformation",
      "camera": "Static",
      "robot_motion": "Arm lowers and presses",
      "object_motion": "Object moves down and presses",
      "physical_logic": "Force application to manipulate fabric"
    }
  ],
  "transitions": [
    "Movement of the right arm from right to center",
    "Introduction of the white object",
    "Contact and pressure application"
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "15.367s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Dual-arm robotic system",
    "end_effector": "Black multi-fingered gripper",
    "manipulated_objects": "Green shirt, white cylindrical object",
    "task_goal": "Folding or smoothing the shirt",
    "initial_state": "Robot arms positioned below the shirt",
    "terminal_state": "Robot arm pressing against the shirt",
    "affordances": "Gripper allows grasping and pressing",
    "failure_modes": "None observed"
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper makes firm contact with the shirt and the white object.",
    "support_and_stability": "The shirt is supported by the mannequin; the robot provides external force.",
    "force_gravity_friction": "Gravity holds the shirt down; friction between the gripper and object facilitates manipulation.",
    "causal_chain": "Robot moves arm -> lifts object -> presses object -> fabric deforms.",
    "object_permanence": "The shirt and robot remain intact throughout the interaction.",
    "next_action_prediction": "The robot may continue to press or move the object to fold the shirt."
  },
  "uncertainties": [
    "The exact purpose of the white object is unclear.",
    "The specific folding algorithm or strategy used by the robot is not visible."
  ],
  "comprehensive_t2v_caption": "A light green, short-sleeved button-up shirt hangs on a white mannequin against a grey wall. Two robotic arms with black grippers are positioned at the bottom of the frame. The right robotic arm moves towards the shirt, lifts a white cylindrical object, and then presses it firmly against the bottom center of the shirt, causing the fabric to bunch up and wrinkle. The left arm remains stationary throughout the sequence."
}

Qwen 3.5 27B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 27B · 196.0s · 2026-06-10T18:11:23+0000

Short Read

At the beginning, the scene is static with the robot arms positioned below the shirt. In the early phase, the right robotic arm begins to move, lifting the white steamer. During the middle phase, the steamer is raised and angled towards the lower right side of the shirt. In the late phase, the steamer is brought closer to the fabric, positioning the nozzle for steaming. By the end, the steamer is held steady against the shirt's hem.

Training Caption Field

A static, medium shot captures a wrinkled, light green short-sleeved button-up shirt hanging on a white mannequin torso against a plain, light grey wall. In the foreground, two black robotic grippers are positioned at the bottom of the frame. The right gripper holds a white handheld garment steamer. Initially, the scene is still. Then, the right robotic arm begins to move, lifting the steamer and angling it towards the lower right side of the shirt. The arm maneuvers the steamer with precision, bringing the nozzle close to the fabric's hem, positioning it for the steaming process. The lighting is diffuse and even, highlighting the texture of the wrinkled shirt and the clean lines of the robotic equipment. The video concludes with the steamer held steady against the shirt, poised to begin the task of removing wrinkles.

Full output JSON/text
{
  "scene_imagination": "A static, medium shot captures a wrinkled, light green short-sleeved button-up shirt hanging on a white mannequin torso against a plain, light grey wall. In the foreground, two black robotic grippers are positioned at the bottom of the frame. The right gripper is holding a white handheld garment steamer. The lighting is diffuse and even, casting soft shadows behind the mannequin and the robot arms.",
  "temporal_caption": "At the beginning, the scene is static with the robot arms positioned below the shirt. In the early phase, the right robotic arm begins to move, lifting the white steamer. During the middle phase, the steamer is raised and angled towards the lower right side of the shirt. In the late phase, the steamer is brought closer to the fabric, positioning the nozzle for steaming. By the end, the steamer is held steady against the shirt's hem.",
  "audio_description": null,
  "subjects": [
    {
      "description": "A light green, short-sleeved button-up shirt.",
      "appearance_details": "The shirt is made of a lightweight fabric, possibly linen or cotton, and is visibly wrinkled. It has white buttons and a single chest pocket on the left side.",
      "relationship": "The object being manipulated.",
      "location": "Center of the frame, hanging on a mannequin.",
      "relative_size": "Large, occupying the majority of the vertical space.",
      "orientation": "Front-facing.",
      "pose": "Hanging vertically.",
      "action": "Being steamed.",
      "state_changes": "Transitioning from a static state to being actively treated by the steamer."
    },
    {
      "description": "A white mannequin torso.",
      "appearance_details": "A simple, white, headless mannequin used to display the shirt.",
      "relationship": "Support for the shirt.",
      "location": "Behind the shirt.",
      "relative_size": "Medium.",
      "orientation": "Front-facing.",
      "pose": "Stationary.",
      "action": "Holding the shirt.",
      "state_changes": "None."
    },
    {
      "description": "Two robotic arms with black grippers.",
      "appearance_details": "The arms are black with visible joints and wiring. The grippers have a pincer-like design.",
      "relationship": "The agents performing the task.",
      "location": "Foreground, at the bottom of the frame.",
      "relative_size": "Medium.",
      "orientation": "Facing the shirt.",
      "pose": "The left arm is stationary; the right arm is active.",
      "action": "The right arm lifts and positions a steamer.",
      "state_changes": "The right arm moves from a resting position to an active manipulation pose."
    },
    {
      "description": "A white handheld garment steamer.",
      "appearance_details": "A modern, white steamer with a cylindrical nozzle and a handle.",
      "relationship": "The tool being used by the robot.",
      "location": "Held by the right robotic gripper.",
      "relative_size": "Small.",
      "orientation": "Angled towards the shirt.",
      "pose": "Being held and maneuvered.",
      "action": "Being used to steam the shirt.",
      "state_changes": "Moved from a resting position to an operational position against the fabric."
    }
  ],
  "background_setting": "A plain, light grey wall serves as the background, providing a neutral and non-distracting environment. The floor is a dark, solid color, visible at the very bottom of the frame.",
  "lighting": {
    "conditions": "Indoor, artificial lighting.",
    "direction": "Frontal and slightly from above.",
    "shadows": "Soft shadows are cast behind the mannequin and the robot arms, indicating a primary light source from the front.",
    "illumination_effect": "The lighting is even and bright enough to clearly show the texture and wrinkles of the shirt fabric."
  },
  "aesthetics": {
    "composition": "The composition is centered on the shirt, with the robot arms framing the bottom of the shot. The vertical lines of the shirt's buttons draw the eye upward.",
    "color_scheme": "Muted and neutral, dominated by the light green of the shirt, the white of the mannequin and steamer, and the grey of the background.",
    "mood_atmosphere": "Functional, clinical, and focused on the task of garment care.",
    "patterns": "The primary pattern is the natural wrinkling of the fabric."
  },
  "cinematography": {
    "framing": "Medium shot, capturing the torso of the mannequin and the robot arms.",
    "camera_angle": "Eye-level, straight-on.",
    "depth_of_field": "Deep, with both the foreground robot arms and the background shirt in focus.",
    "focus": "Sharp focus on the shirt and the robot's end-effectors.",
    "lens_focal_length": "Standard, likely around 35mm-50mm equivalent.",
    "camera_motion": "Static, no camera movement."
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "The video depicts a robot performing a household chore: steaming a wrinkled shirt. The task involves precise manipulation of a tool (the steamer) to interact with a deformable object (the shirt).",
  "actions": [
    {
      "time": "00:00 - 00:05",
      "description": "The scene is static. The robot arms are in a resting position below the shirt.",
      "actor": "Robot arms",
      "object": "None",
      "contact_event": "None",
      "state_change": "None",
      "evidence": "No movement is observed in the frames."
    },
    {
      "time": "00:06 - 00:10",
      "description": "The right robotic arm begins to lift the white steamer.",
      "actor": "Right robotic arm",
      "object": "Steamer",
      "contact_event": "Gripping",
      "state_change": "The steamer is raised from its initial position.",
      "evidence": "The steamer's position changes vertically."
    },
    {
      "time": "00:11 - 00:15",
      "description": "The right robotic arm maneuvers the steamer towards the lower right side of the shirt.",
      "actor": "Right robotic arm",
      "object": "Steamer, Shirt",
      "contact_event": "Approaching",
      "state_change": "The steamer is positioned for use.",
      "evidence": "The steamer is angled and brought close to the fabric."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:05",
      "description": "Initial static state of the scene.",
      "key_changes": "None.",
      "camera": "Static.",
      "robot_motion": "None.",
      "object_motion": "None.",
      "physical_logic": "The system is at rest before the task begins."
    },
    {
      "segment_index": 2,
      "time_range": "00:06 - 00:10",
      "description": "The right robot arm initiates movement, lifting the steamer.",
      "key_changes": "The steamer is lifted.",
      "camera": "Static.",
      "robot_motion": "Vertical lift of the right arm.",
      "object_motion": "The steamer moves upwards.",
      "physical_logic": "The robot overcomes gravity to lift the tool."
    },
    {
      "segment_index": 3,
      "time_range": "00:11 - 00:15",
      "description": "The robot positions the steamer against the shirt.",
      "key_changes": "The steamer is angled and brought into proximity with the shirt.",
      "camera": "Static.",
      "robot_motion": "Complex multi-joint movement to position the steamer.",
      "object_motion": "The steamer moves towards the shirt.",
      "physical_logic": "The robot demonstrates spatial awareness to avoid collision while positioning the tool for the task."
    }
  ],
  "transitions": [
    "From a static state to the initiation of the steaming task.",
    "From lifting the steamer to positioning it for use."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16:9",
  "duration": "15.367",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Dual-arm robotic system with black grippers.",
    "end_effector": "Pincer-style grippers.",
    "manipulated_objects": "White handheld garment steamer.",
    "task_goal": "To steam the wrinkles out of the green shirt.",
    "initial_state": "Robot arms at rest, shirt is wrinkled.",
    "terminal_state": "Steamer is positioned against the shirt, ready to operate.",
    "affordances": "The grippers afford grasping and manipulating the steamer. The steamer affords the application of heat and steam to fabric.",
    "failure_modes": "Potential failure includes dropping the steamer, colliding with the shirt or mannequin, or failing to position the steamer correctly."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The right gripper maintains a firm grasp on the steamer's handle throughout the movement.",
    "support_and_stability": "The mannequin provides stable support for the shirt, keeping it in place for the steaming process.",
    "force_gravity_friction": "The robot must apply sufficient force to lift the steamer against gravity. Friction between the gripper and the steamer handle is crucial for maintaining the grasp.",
    "causal_chain": "The robot's motor commands cause the arm to move, which in turn moves the steamer, positioning it to interact with the shirt.",
    "object_permanence": "The shirt and mannequin remain constant throughout the clip, demonstrating object permanence.",
    "next_action_prediction": "The next logical action is for the steamer to be activated, releasing steam onto the shirt to remove wrinkles."
  },
  "uncertainties": [
    "It is unclear if the steamer is already turned on or if it will be activated in the next step.",
    "The exact model and capabilities of the robotic arms are not specified."
  ],
  "comprehensive_t2v_caption": "A static, medium shot captures a wrinkled, light green short-sleeved button-up shirt hanging on a white mannequin torso against a plain, light grey wall. In the foreground, two black robotic grippers are positioned at the bottom of the frame. The right gripper holds a white handheld garment steamer. Initially, the scene is still. Then, the right robotic arm begins to move, lifting the steamer and angling it towards the lower right side of the shirt. The arm maneuvers the steamer with precision, bringing the nozzle close to the fabric's hem, positioning it for the steaming process. The lighting is diffuse and even, highlighting the texture of the wrinkled shirt and the clean lines of the robotic equipment. The video concludes with the steamer held steady against the shirt, poised to begin the task of removing wrinkles."
}

Cosmos 3 Nano (16B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Nano (16B) · 14.5s · 2026-06-10T18:36:38+0000

Short Read

The video begins with the green shirt hanging motionless on the steamer. The robotic arms remain stationary at the bottom of the frame. At around 3 seconds, the right robotic arm begins to move upward, lifting the white steamer head. By 7 seconds, the steamer head is raised and positioned above the shirt. The arm continues to hold the steamer head steady for the remainder of the video.

Training Caption Field

A static medium shot of a green button-up shirt hanging on a white garment steamer against a plain light gray wall. Two robotic arms with black grippers are positioned at the bottom of the frame, facing the shirt. The steamer head is white and cylindrical, resting on the right side of the steamer stand. At around 3 seconds, the right robotic arm begins to move upward, lifting the white steamer head. By 7 seconds, the steamer head is raised and positioned above the shirt. The arm continues to hold the steamer head steady for the remainder of the video. The lighting is even and neutral, casting soft shadows on the wall behind the shirt. The camera remains static throughout the video.

Full output JSON/text
{
  "scene_imagination": "A static medium shot of a green button-up shirt hanging on a white garment steamer against a plain light gray wall. Two robotic arms with black grippers are positioned at the bottom of the frame, facing the shirt. The steamer head is white and cylindrical, resting on the right side of the steamer stand. The lighting is even and neutral, casting soft shadows on the wall behind the shirt.",
  "temporal_caption": "The video begins with the green shirt hanging motionless on the steamer. The robotic arms remain stationary at the bottom of the frame. At around 3 seconds, the right robotic arm begins to move upward, lifting the white steamer head. By 7 seconds, the steamer head is raised and positioned above the shirt. The arm continues to hold the steamer head steady for the remainder of the video.",
  "audio_description": "A low, steady hum from the garment steamer motor is audible throughout the video. There is a faint mechanical clicking sound as the robotic arm's joints move to lift the steamer head.",
  "subjects": [
    {
      "description": "A short-sleeved, button-up shirt in a muted sage green color, hanging on a white garment steamer.",
      "appearance_details": "The shirt has a chest pocket on the left side, a vertical row of white buttons down the front, and a slightly wrinkled texture.",
      "relationship": "The primary object being manipulated by the robotic arms.",
      "location": "Center of the frame, hanging vertically.",
      "relative_size": "Large within frame.",
      "orientation": "Facing the camera.",
      "pose": "Hanging straight down.",
      "action": "Remains stationary throughout the video.",
      "state_changes": "No significant change."
    },
    {
      "description": "A white garment steamer with a cylindrical head and a vertical stand.",
      "appearance_details": "The steamer head is white and cylindrical, attached to a white stand with a base. A black power cord is visible on the right side.",
      "relationship": "Supports the shirt and is being manipulated by the robotic arms.",
      "location": "Right side of the frame, partially obscured by the shirt.",
      "relative_size": "Medium within frame.",
      "orientation": "Facing left.",
      "pose": "Standing upright.",
      "action": "The steamer head is lifted upward by the right robotic arm.",
      "state_changes": "The steamer head moves from a lowered position to a raised position above the shirt."
    },
    {
      "description": "Two robotic arms with black grippers, positioned at the bottom of the frame.",
      "appearance_details": "The arms are black with articulated joints and grippers. They appear to be mounted on a base outside the frame.",
      "relationship": "Manipulating the garment steamer.",
      "location": "Bottom of the frame, one on the left and one on the right.",
      "relative_size": "Small within frame.",
      "orientation": "Facing upward toward the shirt.",
      "pose": "Extended upward.",
      "action": "The right robotic arm moves upward to lift the steamer head.",
      "state_changes": "The right arm moves from a stationary position to a raised position."
    }
  ],
  "background_setting": "A plain, light gray wall with no visible decorations or features. The surface is smooth and evenly lit.",
  "lighting": {
    "conditions": "Even, neutral indoor lighting.",
    "direction": "Front-lit, slightly from above.",
    "shadows": "Soft shadows cast by the shirt and steamer onto the wall behind them.",
    "illumination_effect": "No significant highlights or reflections; the lighting is diffuse and consistent."
  },
  "aesthetics": {
    "composition": "Centered, with the shirt as the focal point and the robotic arms framing the bottom.",
    "color_scheme": "Muted and neutral, dominated by the sage green of the shirt, the white of the steamer, and the black of the robotic arms against a gray background.",
    "mood_atmosphere": "Clinical, technical, and focused.",
    "patterns": "The vertical line of buttons on the shirt provides a subtle linear pattern."
  },
  "cinematography": {
    "camera_angle": "Eye-level, straight-on.",
    "framing": "Medium shot.",
    "depth_of_field": "Deep, with both the shirt and the background in focus.",
    "focus": "Sharp focus on the shirt and the robotic arms.",
    "lens_focal_length": "Standard.",
    "camera_motion": "Static."
  },
  "style_medium": "Live-action video.",
  "artistic_style": "Realistic, technical.",
  "context": "A demonstration of a robotic system manipulating a garment steamer, likely for research or development purposes.",
  "actions": [
    {
      "time": "0:00-0:03",
      "description": "The shirt hangs motionless on the steamer. The robotic arms remain stationary.",
      "actor": "Robotic arms",
      "object": "Garment steamer",
      "contact_event": "No contact.",
      "state_change": "No significant change."
    },
    {
      "time": "0:03-0:07",
      "description": "The right robotic arm begins to move upward, lifting the white steamer head.",
      "actor": "Right robotic arm",
      "object": "Garment steamer head",
      "contact_event": "The gripper makes contact with the steamer head and lifts it.",
      "state_change": "The steamer head moves from a lowered position to a raised position above the shirt."
    },
    {
      "time": "0:07-0:15",
      "description": "The right robotic arm holds the steamer head steady in its raised position.",
      "actor": "Right robotic arm",
      "object": "Garment steamer head",
      "contact_event": "The gripper maintains contact with the steamer head.",
      "state_change": "No significant change."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0:00-0:03",
      "description": "The scene is static, showing the green shirt hanging on the white steamer with the robotic arms at the bottom.",
      "key_changes": "None.",
      "camera": "Static medium shot.",
      "robot_motion": "None.",
      "object_motion": "None.",
      "physical_reasoning": "The objects are stationary, with no interaction occurring."
    },
    {
      "segment_index": 1,
      "time_range": "0:03-0:07",
      "description": "The right robotic arm moves upward to lift the steamer head.",
      "key_changes": "The steamer head is raised.",
      "camera": "Static medium shot.",
      "robot_motion": "The right robotic arm moves upward.",
      "object_motion": "The steamer head moves upward.",
      "physical_reasoning": "The robotic arm's gripper makes contact with the steamer head and lifts it."
    },
    {
      "segment_index": 2,
      "time_range": "0:07-0:15",
      "description": "The right robotic arm holds the steamer head steady in its raised position.",
      "key_changes": "None.",
      "camera": "Static medium shot.",
      "robot_motion": "None.",
      "object_motion": "None.",
      "physical_reasoning": "The robotic arm maintains its position, holding the steamer head steady."
    }
  ],
  "transitions": [],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "0:15",
  "fps": 30,
  "robotics": {
    "embodiment": "Two robotic arms with black grippers, mounted on a base outside the frame.",
    "end_effector": "Black grippers.",
    "manipulated_objects": "Garment steamer head.",
    "task_goal": "Lift the garment steamer head to a position above the shirt.",
    "initial_state": "The robotic arms are stationary at the bottom of the frame.",
    "terminal_state": "The right robotic arm is holding the steamer head in a raised position.",
    "affordances": "The steamer head has a cylindrical shape that can be gripped and lifted.",
    "failure_modes": "The gripper could slip off the steamer head or the arm could move too quickly."
  },
  "physical_reasoning": "The robotic arm's gripper makes contact with the steamer head and applies an upward force, lifting it. The force is sufficient to overcome the weight of the steamer head, causing it to move upward. The arm then holds the steamer head steady, maintaining the upward force.",
  "uncertainties": [],
  "comprehensive_t2v_caption": "A static medium shot of a green button-up shirt hanging on a white garment steamer against a plain light gray wall. Two robotic arms with black grippers are positioned at the bottom of the frame, facing the shirt. The steamer head is white and cylindrical, resting on the right side of the steamer stand. At around 3 seconds, the right robotic arm begins to move upward, lifting the white steamer head. By 7 seconds, the steamer head is raised and positioned above the shirt. The arm continues to hold the steamer head steady for the remainder of the video. The lighting is even and neutral, casting soft shadows on the wall behind the shirt. The camera remains static throughout the video."
}

Cosmos 3 Super (64B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Super (64B) · 17.1s · 2026-06-10T16:12:48+0000

Short Read

The video begins with a static shot of a wrinkled green shirt hanging on a white steamer, flanked by two robotic arms. The right robotic arm then slowly and steadily lifts the steamer's head upwards, bringing it closer to the shirt.

Training Caption Field

A static shot shows a wrinkled light green button-up shirt hanging on a white garment steamer, flanked by two robotic arms with black grippers. The right robotic arm then slowly and steadily lifts the steamer's cylindrical head upwards, bringing it closer to the shirt. The background is a plain, light gray wall, and the lighting is even and diffused. The video captures the precise, controlled motion of the robotic arm as it interacts with the steamer.

Full output JSON/text
{
  "scene_imagination": "A light green, short-sleeved button-up shirt hangs on a white steamer, slightly wrinkled. Two robotic arms with black grippers are positioned in front of the shirt. The right robotic arm begins to move upward, lifting the steamer's head towards the shirt. The background is a plain, light gray wall, and the lighting is even and diffused.",
  "temporal_caption": "The video begins with a static shot of a wrinkled green shirt hanging on a white steamer, flanked by two robotic arms. The right robotic arm then slowly and steadily lifts the steamer's head upwards, bringing it closer to the shirt.",
  "audio_description": "The video is silent, with no audible sounds or music.",
  "subjects": [
    {
      "description": "A light green, short-sleeved button-up shirt hanging on a white steamer. The shirt has visible wrinkles and a chest pocket on the left side.",
      "appearance_details": "Light green fabric, short sleeves, button-down front, chest pocket on the left.",
      "relationship": "Hanging on the steamer, positioned between the two robotic arms.",
      "location": "Center of the frame.",
      "relative_size": "Large, occupying most of the vertical space.",
      "orientation": "Facing the camera.",
      "pose": "Hanging vertically.",
      "action": "Remains stationary throughout the video.",
      "state_changes": "No significant change."
    },
    {
      "description": "A white garment steamer with a cylindrical head and a vertical stand.",
      "appearance_details": "White plastic construction, cylindrical steamer head, vertical stand.",
      "relationship": "Supporting the green shirt, positioned behind the robotic arms.",
      "location": "Center, behind the shirt.",
      "relative_size": "Large, extending from the bottom to the top of the frame.",
      "orientation": "Vertical.",
      "pose": "Standing upright.",
      "action": "The steamer head is lifted upwards by the right robotic arm.",
      "state_changes": "The steamer head moves from a lower position to a higher position."
    },
    {
      "description": "Two robotic arms with black grippers, positioned on either side of the shirt.",
      "appearance_details": "Black metallic arms with articulated joints and black grippers. The left arm has a white rectangular component attached to it.",
      "relationship": "Positioned in front of the shirt and steamer, interacting with the steamer.",
      "location": "Left and right sides of the frame.",
      "relative_size": "Medium, extending from the bottom to the middle of the frame.",
      "orientation": "Facing the shirt and steamer.",
      "pose": "Articulated, with grippers open.",
      "action": "The right robotic arm lifts the steamer head upwards.",
      "state_changes": "The right robotic arm moves upwards, lifting the steamer head."
    }
  ],
  "background_setting": "A plain, light gray wall serves as the background. The setting appears to be an indoor environment, possibly a laboratory or testing facility.",
  "lighting": {
    "conditions": "Even, diffused indoor lighting.",
    "direction": "Front-lit, illuminating the shirt and robotic arms clearly.",
    "shadows": "Soft shadows are cast on the wall behind the shirt and steamer.",
    "illumination_effect": "The lighting highlights the wrinkles on the shirt and the metallic surfaces of the robotic arms."
  },
  "aesthetics": {
    "composition": "Centered, with the shirt and steamer as the main focal point, flanked by the robotic arms.",
    "color_scheme": "Muted tones, primarily light green, white, and black.",
    "mood_atmosphere": "Clinical, focused, and technological.",
    "patterns": "The shirt has a subtle, textured pattern due to the wrinkles."
  },
  "cinematography": {
    "camera_motion": "Static shot throughout the video.",
    "framing": "Medium shot, capturing the shirt, steamer, and robotic arms.",
    "camera_angle": "Eye-level.",
    "depth_of_field": "Deep, with both the foreground (robotic arms) and background (shirt and wall) in focus.",
    "focus": "Sharp focus on the shirt, steamer, and robotic arms.",
    "lens_focal_length": "Standard lens."
  },
  "style_medium": "Live-action video.",
  "artistic_style": "Realistic, documentary-style.",
  "context": "A demonstration of a robotic system interacting with a garment steamer and a wrinkled shirt, likely for testing or research purposes.",
  "actions": [
    {
      "time": "0:00-0:05",
      "description": "The scene is static, showing the wrinkled green shirt hanging on the white steamer, with the two robotic arms positioned in front.",
      "actor": "None.",
      "object": "Shirt, steamer, robotic arms.",
      "contact_event": "None.",
      "state_changes": "None."
    },
    {
      "time": "0:05-0:15",
      "description": "The right robotic arm begins to move upwards, lifting the steamer head towards the shirt.",
      "actor": "Right robotic arm.",
      "object": "Steamer head.",
      "contact_event": "The right robotic arm grasps and lifts the steamer head.",
      "state_changes": "The steamer head moves from a lower position to a higher position."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0:00-0:05",
      "description": "Static shot of the wrinkled green shirt hanging on the white steamer, with the two robotic arms positioned in front.",
      "key_changes": "None.",
      "camera": "Static medium shot.",
      "robot_motion": "None.",
      "object_motion": "None."
    },
    {
      "segment_index": 1,
      "time_range": "0:05-0:15",
      "description": "The right robotic arm lifts the steamer head upwards, bringing it closer to the shirt.",
      "key_changes": "The steamer head moves upwards.",
      "camera": "Static medium shot.",
      "robot_motion": "The right robotic arm moves upwards.",
      "object_motion": "The steamer head moves upwards."
    }
  ],
  "transitions": [
    "No notable transitions; the video is a continuous shot."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16,9",
  "duration": "15.367s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Two robotic arms with black grippers, positioned on either side of the shirt.",
    "end_effector": "Black grippers.",
    "manipulated_objects": "Steamer head.",
    "task_goal": "To lift the steamer head upwards.",
    "initial_state": "The steamer head is in a lower position.",
    "terminal_state": "The steamer head is lifted to a higher position.",
    "affordances": "The robotic arms have the ability to grasp and lift objects.",
    "failure_modes": "None observed."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The right robotic arm grasps the steamer head and lifts it upwards.",
    "support_and_stability": "The steamer is supported by its stand, and the robotic arms provide stability during the lifting motion.",
    "force_gravity_friction": "The robotic arm applies force to overcome gravity and lift the steamer head.",
    "causal_chain": "The right robotic arm initiates the lifting motion, causing the steamer head to move upwards.",
    "object_permanence": "The shirt, steamer, and robotic arms remain in the scene throughout the video.",
    "next_action_prediction": "The right robotic arm will continue to lift the steamer head until it reaches the desired position."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A static shot shows a wrinkled light green button-up shirt hanging on a white garment steamer, flanked by two robotic arms with black grippers. The right robotic arm then slowly and steadily lifts the steamer's cylindrical head upwards, bringing it closer to the shirt. The background is a plain, light gray wall, and the lighting is even and diffused. The video captures the precise, controlled motion of the robotic arm as it interacts with the steamer."
}

Cosmos 3 Super (64B) / p09: Reproduction schema

ok
p09: Reproduction schema · Cosmos 3 Super (64B) · 18.9s · 2026-06-10T18:54:35+0000

Short Read

Two robot arms are positioned below a hanging green shirt, with the right arm lifting a white cylindrical object upward.

Training Caption Field

The video shows a green shirt hanging in the center of the frame, with two robot arms positioned below it. The left robot arm remains stationary throughout the video. The right robot arm, which is holding a white cylindrical object, begins to lift the object upward. The object is lifted steadily until it reaches a higher position in the frame. The background is a plain gray wall, and the lighting is even and consistent.

Full output JSON/text
{
  "schema_version": "p09_reproduction_v1",
  "one_sentence_summary": "Two robot arms are positioned below a hanging green shirt, with the right arm lifting a white cylindrical object upward.",
  "video_metadata": {
    "resolution": {
      "W": 1152,
      "H": 816
    },
    "aspect_ratio": "16,9",
    "duration_sec": 15.367,
    "fps": 30.0,
    "frame_count": 461
  },
  "camera": {
    "viewpoint": "front",
    "framing": "medium",
    "camera_angle": "eye-level",
    "camera_motion": "static",
    "estimated_height_or_distance": "1m",
    "lens_or_distortion": "none",
    "crop_boundaries": "none",
    "stability": "stable"
  },
  "scene_layout": {
    "coordinate_frame": "2d",
    "background": "plain gray wall",
    "surface": "none",
    "lighting": "even indoor lighting",
    "global_spatial_map": "green shirt hanging in center, two robot arms below, white cylindrical object held by right arm",
    "occlusions": "none"
  },
  "subjects": [
    {
      "id": "robot_arm_left",
      "category": "robot arm",
      "appearance": "black and white mechanical arm with joints and gripper",
      "visible_parts": "arm, gripper",
      "initial_pose": "resting below shirt",
      "final_pose": "resting below shirt",
      "workspace_location": "bottom left",
      "motion_path": "none",
      "end_effector": "gripper",
      "uncertainties": "none"
    },
    {
      "id": "robot_arm_right",
      "category": "robot arm",
      "appearance": "black and white mechanical arm with joints and gripper",
      "visible_parts": "arm, gripper",
      "initial_pose": "resting below shirt",
      "final_pose": "holding white cylindrical object",
      "workspace_location": "bottom right",
      "motion_path": "upward",
      "end_effector": "gripper",
      "uncertainties": "none"
    }
  ],
  "objects": [
    {
      "id": "green_shirt",
      "category": "shirt",
      "appearance": "light green, button-down, short-sleeved, wrinkled",
      "label_or_text": "none",
      "material": "fabric",
      "shape": "rectangular with sleeves",
      "size_relative_to_frame": "large",
      "initial_pose": "hanging vertically",
      "final_pose": "hanging vertically",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "white_cylindrical_object",
      "category": "cylindrical object",
      "appearance": "white, smooth, cylindrical",
      "label_or_text": "none",
      "material": "plastic or foam",
      "shape": "cylinder",
      "size_relative_to_frame": "medium",
      "initial_pose": "resting on right arm",
      "final_pose": "lifted upward",
      "state_changes": "lifted",
      "uncertainties": "none"
    }
  ],
  "initial_state": "Green shirt hanging in center, both robot arms resting below it, white cylindrical object resting on right arm.",
  "final_state": "Green shirt hanging in center, left robot arm resting below it, right robot arm holding white cylindrical object lifted upward.",
  "timeline": [
    {
      "time_range": "0-5s",
      "description": "Both robot arms are resting below the green shirt.",
      "active_subjects": [],
      "object_states": {
        "green_shirt": "hanging vertically",
        "robot_arm_left": "resting",
        "robot_arm_right": "resting",
        "white_cylindrical_object": "resting on right arm"
      },
      "robot_motion": "none",
      "object_motion": "none",
      "gripper_state": "open",
      "contact_state": "none",
      "spatial_changes": "none",
      "physical_causality": "none"
    },
    {
      "time_range": "5-10s",
      "description": "Right robot arm begins to lift the white cylindrical object upward.",
      "active_subjects": [
        "robot_arm_right"
      ],
      "object_states": {
        "green_shirt": "hanging vertically",
        "robot_arm_left": "resting",
        "robot_arm_right": "lifting",
        "white_cylindrical_object": "being lifted"
      },
      "robot_motion": "upward",
      "object_motion": "upward",
      "gripper_state": "closed",
      "contact_state": "gripping",
      "spatial_changes": "white cylindrical object moves upward",
      "physical_causality": "robot arm lifts object"
    },
    {
      "time_range": "10-15s",
      "description": "Right robot arm continues to lift the white cylindrical object upward.",
      "active_subjects": [
        "robot_arm_right"
      ],
      "object_states": {
        "green_shirt": "hanging vertically",
        "robot_arm_left": "resting",
        "robot_arm_right": "lifting",
        "white_cylindrical_object": "lifted upward"
      },
      "robot_motion": "upward",
      "object_motion": "upward",
      "gripper_state": "closed",
      "contact_state": "gripping",
      "spatial_changes": "white cylindrical object moves further upward",
      "physical_causality": "robot arm lifts object"
    }
  ],
  "parallel_action_graph": {
    "simultaneous_actions": [],
    "dependencies": [],
    "ordering_constraints": []
  },
  "keyframes": [
    {
      "time": 0,
      "frame_description": "Green shirt hanging in center, both robot arms resting below it, white cylindrical object resting on right arm.",
      "subject_poses": {
        "robot_arm_left": "resting",
        "robot_arm_right": "resting"
      },
      "object_poses": {
        "green_shirt": "hanging vertically",
        "white_cylindrical_object": "resting on right arm"
      },
      "contacts": []
    },
    {
      "time": 5,
      "frame_description": "Right robot arm begins to lift the white cylindrical object upward.",
      "subject_poses": {
        "robot_arm_left": "resting",
        "robot_arm_right": "lifting"
      },
      "object_poses": {
        "green_shirt": "hanging vertically",
        "white_cylindrical_object": "being lifted"
      },
      "contacts": [
        {
          "subject_id": "robot_arm_right",
          "object_id": "white_cylindrical_object",
          "contact_type": "gripping"
        }
      ]
    },
    {
      "time": 10,
      "frame_description": "Right robot arm continues to lift the white cylindrical object upward.",
      "subject_poses": {
        "robot_arm_left": "resting",
        "robot_arm_right": "lifting"
      },
      "object_poses": {
        "green_shirt": "hanging vertically",
        "white_cylindrical_object": "lifted upward"
      },
      "contacts": [
        {
          "subject_id": "robot_arm_right",
          "object_id": "white_cylindrical_object",
          "contact_type": "gripping"
        }
      ]
    },
    {
      "time": 15,
      "frame_description": "Right robot arm holds the white cylindrical object lifted upward.",
      "subject_poses": {
        "robot_arm_left": "resting",
        "robot_arm_right": "holding"
      },
      "object_poses": {
        "green_shirt": "hanging vertically",
        "white_cylindrical_object": "lifted upward"
      },
      "contacts": [
        {
          "subject_id": "robot_arm_right",
          "object_id": "white_cylindrical_object",
          "contact_type": "gripping"
        }
      ]
    }
  ],
  "contact_events": [
    {
      "time": 5,
      "subject_id": "robot_arm_right",
      "object_id": "white_cylindrical_object",
      "contact_type": "gripping",
      "contact_location": "gripper",
      "grasp_or_support_quality": "secure",
      "release_or_transfer": "none",
      "visible_evidence": "gripper closes around object"
    }
  ],
  "physics_and_affordances": {
    "support_surfaces": [],
    "gravity_and_stability": "stable",
    "force_or_constraint_reasoning": "robot arm lifts object against gravity",
    "collision_or_near_collision": "none",
    "object_permanence": "all objects remain present",
    "task_goal": "lift white cylindrical object",
    "failure_modes": [],
    "next_action_prediction": "none"
  },
  "appearance_fidelity": {
    "colors": "green shirt, black and white robot arms, white cylindrical object, gray background",
    "textures": "fabric shirt, smooth plastic or foam object",
    "lighting_shadows": "even indoor lighting, soft shadows",
    "robot_morphology": "mechanical arms with joints and grippers",
    "object_fine_details": "wrinkles on shirt, smooth surface on object",
    "background_fine_details": "plain gray wall"
  },
  "text_and_labels": [],
  "audio_description": "none",
  "generation_controls": {
    "shot_type": "medium",
    "camera_motion_instruction": "static",
    "motion_style": "realistic",
    "duration_instruction": "15.367s",
    "spatial_consistency_requirements": "consistent",
    "negative_constraints": "none",
    "recommended_caption_style": "descriptive"
  },
  "uncertainties": [],
  "comprehensive_reproduction_caption": "The video shows a green shirt hanging in the center of the frame, with two robot arms positioned below it. The left robot arm remains stationary throughout the video. The right robot arm, which is holding a white cylindrical object, begins to lift the object upward. The object is lifted steadily until it reaches a higher position in the frame. The background is a plain gray wall, and the lighting is even and consistent."
}

rovid_006

top

Qwen 3.5 9B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 9B · 64.1s · 2026-06-10T17:48:11+0000

Short Read

At the beginning, a robotic gripper holds a brown cap near a coat rack. In the middle, the robot lifts the cap and maneuvers it towards the hook on the rack. By the end, the cap is successfully hung on the hook, and the robot arm retracts slightly.

Training Caption Field

A black robotic arm with a multi-fingered gripper is shown in a high-angle close-up, holding a brown fabric cap. The robot lifts the cap and maneuvers it towards a black metal coat rack standing on a light wood floor. With precision, the robot positions the cap over a hook on the rack and releases it, allowing gravity to pull the cap onto the hook where it hangs securely. The robot arm then retracts slightly, completing the task of hanging the cap.

Full output JSON/text
{
  "scene_imagination": "A close-up, high-angle view of a robotic manipulation task in a domestic interior. A black robotic arm with a multi-fingered gripper holds a brown fabric cap. The arm is positioned near a black metal coat rack with a silver circular base. A beige upholstered chair sits to the right. The floor is light wood laminate. The lighting is bright and even, casting soft shadows.",
  "temporal_caption": "At the beginning, a robotic gripper holds a brown cap near a coat rack. In the middle, the robot lifts the cap and maneuvers it towards the hook on the rack. By the end, the cap is successfully hung on the hook, and the robot arm retracts slightly.",
  "audio_description": null,
  "subjects": [
    {
      "description": "Robotic arm and gripper",
      "appearance_details": "Black, metallic, multi-jointed arm with a complex, multi-fingered end-effector.",
      "relationship": "Manipulator",
      "location": "Left foreground",
      "relative_size": "Large relative to the cap",
      "orientation": "Angled towards the coat rack",
      "pose": "Dynamic, moving from holding to releasing",
      "action": "Holding, lifting, positioning, releasing",
      "state_changes": "Moves from a static hold to a release motion"
    },
    {
      "description": "Brown cap",
      "appearance_details": "Fabric material, brown color, structured crown, red tag visible on the brim.",
      "relationship": "Object being manipulated",
      "location": "Initially held by robot, then on coat rack",
      "relative_size": "Small",
      "orientation": "Varies as it is moved",
      "pose": "Held, then hanging",
      "action": "Being moved, then hanging",
      "state_changes": "Transitions from being held in air to hanging on a hook"
    },
    {
      "description": "Coat rack",
      "appearance_details": "Black metal pole, silver circular base, curved hooks at the top.",
      "relationship": "Target object",
      "location": "Center background",
      "relative_size": "Tall, vertical",
      "orientation": "Vertical",
      "pose": "Stationary",
      "action": "None",
      "state_changes": "None"
    }
  ],
  "background_setting": "An indoor room with light wood laminate flooring, light grey walls, and a white electrical outlet visible on the wall. A beige upholstered dining chair is positioned to the right of the coat rack.",
  "lighting": {
    "conditions": "Indoor, artificial, bright",
    "direction": "Overhead and diffuse",
    "shadows": "Soft shadows cast by the robot and chair onto the floor",
    "illumination_effect": "Clear visibility of object textures and robot mechanics"
  },
  "aesthetics": {
    "composition": "High-angle, slightly tilted perspective focusing on the interaction zone",
    "color_scheme": "Neutral tones (black, beige, grey) with a pop of brown from the cap",
    "mood_atmosphere": "Clinical, functional, focused",
    "patterns": "Wood grain on floor, fabric texture on cap and chair"
  },
  "cinematography": {
    "framing": "Medium close-up",
    "camera_angle": "High angle, looking down",
    "depth_of_field": "Deep focus, keeping robot and background sharp",
    "focus": "Sharp on the robot gripper and cap",
    "lens_focal_length": "Unknown, likely standard or slightly wide",
    "camera_motion": "Static"
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "A demonstration of a robot's ability to perform a household task: hanging a piece of clothing on a rack.",
  "actions": [
    {
      "time": "0.0s - 2.0s",
      "description": "The robot holds the brown cap in its gripper, adjusting its grip slightly.",
      "actor": "Robot arm",
      "object": "Brown cap",
      "contact_event": "Continuous contact",
      "state_change": "None",
      "evidence": "Cap remains in gripper"
    },
    {
      "time": "2.0s - 6.0s",
      "description": "The robot lifts the cap and moves it towards the coat rack hook.",
      "actor": "Robot arm",
      "object": "Brown cap",
      "contact_event": "Continuous contact",
      "state_change": "Cap position changes from held to near rack",
      "evidence": "Cap moves upwards and towards the rack"
    },
    {
      "time": "6.0s - 9.0s",
      "description": "The robot positions the cap over the hook and releases it.",
      "actor": "Robot arm",
      "object": "Brown cap",
      "contact_event": "Release",
      "state_change": "Cap transitions from held to hanging",
      "evidence": "Cap drops onto the hook and hangs freely"
    },
    {
      "time": "9.0s - 11.6s",
      "description": "The robot arm retracts and moves away from the now-hung cap.",
      "actor": "Robot arm",
      "object": "None",
      "contact_event": "None",
      "state_change": "Robot moves away",
      "evidence": "Arm moves out of frame or away from the rack"
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "0.0s - 2.0s",
      "description": "Initial state where the robot holds the cap.",
      "key_changes": "Minor adjustments in grip",
      "camera": "Static",
      "robot_motion": "Minimal",
      "object_motion": "Minimal",
      "physical_logic": "Stable grasp"
    },
    {
      "segment_index": 2,
      "time_range": "2.0s - 6.0s",
      "description": "Lifting and positioning phase.",
      "key_changes": "Cap moves towards the rack",
      "camera": "Static",
      "robot_motion": "Lifting and translating",
      "object_motion": "Translating upwards and forwards",
      "physical_logic": "Overcoming gravity to position object"
    },
    {
      "segment_index": 3,
      "time_range": "6.0s - 9.0s",
      "description": "Release and hanging phase.",
      "key_changes": "Cap is released and hangs on the hook",
      "camera": "Static",
      "robot_motion": "Retracting",
      "object_motion": "Dropping onto hook",
      "physical_logic": "Gravity pulls cap onto hook, friction holds it"
    },
    {
      "segment_index": 4,
      "time_range": "9.0s - 11.6s",
      "description": "Final state with cap hung and robot moving away.",
      "key_changes": "Robot arm moves away",
      "camera": "Static",
      "robot_motion": "Retracting",
      "object_motion": "Stationary",
      "physical_logic": "Task completion"
    }
  ],
  "transitions": [
    "Lifting motion",
    "Positioning motion",
    "Release event",
    "Retraction motion"
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "11.633s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Multi-jointed robotic arm",
    "end_effector": "Multi-fingered gripper",
    "manipulated_objects": [
      "Brown cap"
    ],
    "task_goal": "Hang the cap on the coat rack",
    "initial_state": "Cap held in gripper",
    "terminal_state": "Cap hanging on rack, gripper empty",
    "affordances": "Gripper provides grasp and release capability",
    "failure_modes": "None observed"
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper maintains a firm hold on the cap until the precise moment of release.",
    "support_and_stability": "The coat rack provides a stable vertical support for the cap.",
    "force_gravity_friction": "Gravity pulls the cap down onto the hook; friction between the cap and hook prevents it from sliding off.",
    "causal_chain": "Robot lifts cap -> Robot positions cap over hook -> Robot releases cap -> Cap falls onto hook -> Cap hangs.",
    "object_permanence": "The cap remains intact and functional throughout the manipulation.",
    "next_action_prediction": "The robot may move to pick up another object or return to a home position."
  },
  "uncertainties": [
    "The specific control algorithm used by the robot is unknown."
  ],
  "comprehensive_t2v_caption": "A black robotic arm with a multi-fingered gripper is shown in a high-angle close-up, holding a brown fabric cap. The robot lifts the cap and maneuvers it towards a black metal coat rack standing on a light wood floor. With precision, the robot positions the cap over a hook on the rack and releases it, allowing gravity to pull the cap onto the hook where it hangs securely. The robot arm then retracts slightly, completing the task of hanging the cap."
}

Qwen 3.5 27B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 27B · 175.1s · 2026-06-10T18:14:18+0000

Short Read

At the beginning, the left robotic gripper holds a brown cap near the base of a coat rack. In the early phase, the gripper lifts the cap upwards. During the middle section, the gripper maneuvers the cap towards a hook on the coat rack. Towards the late phase, the cap is successfully placed onto the hook. By the end, the gripper releases the cap and retracts, leaving the cap hanging on the rack.

Training Caption Field

In a high-angle static shot, a dual-armed robot performs a manipulation task in a domestic setting. The left robotic gripper, black and multi-fingered, initially holds a brown fabric cap near the base of a black coat rack with a silver base. The right gripper remains idle. The left gripper lifts the cap upwards, maneuvering it towards a hook on the coat rack. With precise movements, the gripper places the cap onto the hook and releases it, allowing the cap to hang freely. The scene is evenly lit, highlighting the neutral tones of the room, which includes a beige curtain and a grey upholstered chair. The video concludes with the cap successfully hung on the coat rack, demonstrating the robot's capability in object manipulation and spatial awareness.

Full output JSON/text
{
  "scene_imagination": "A high-angle, static shot captures a robotic manipulation task in a domestic setting. Two black, multi-fingered robotic grippers are visible; the left gripper holds a brown fabric cap, while the right gripper remains idle. In the center stands a black coat rack with a silver base, positioned next to a beige curtain and a grey upholstered chair. The floor is light wood laminate, and a grey wall with a white electrical outlet is visible in the background.",
  "temporal_caption": "At the beginning, the left robotic gripper holds a brown cap near the base of a coat rack. In the early phase, the gripper lifts the cap upwards. During the middle section, the gripper maneuvers the cap towards a hook on the coat rack. Towards the late phase, the cap is successfully placed onto the hook. By the end, the gripper releases the cap and retracts, leaving the cap hanging on the rack.",
  "audio_description": null,
  "subjects": [
    {
      "description": "Left robotic arm with a multi-fingered gripper.",
      "appearance_details": "Black, mechanical, articulated fingers.",
      "relationship": "Manipulator of the cap.",
      "location": "Left side of the frame.",
      "relative_size": "Medium.",
      "orientation": "Facing the coat rack.",
      "pose": "Active manipulation.",
      "action": "Lifting and hanging the cap.",
      "state_changes": "Moves from holding the cap to releasing it."
    },
    {
      "description": "Right robotic arm with a multi-fingered gripper.",
      "appearance_details": "Black, mechanical, articulated fingers.",
      "relationship": "Idle observer.",
      "location": "Right side of the frame.",
      "relative_size": "Medium.",
      "orientation": "Facing the coat rack.",
      "pose": "Stationary.",
      "action": "None.",
      "state_changes": "None."
    },
    {
      "description": "Brown fabric cap.",
      "appearance_details": "Soft material, brown color, red tag visible.",
      "relationship": "Object being manipulated.",
      "location": "Held by the left gripper, then on the coat rack.",
      "relative_size": "Small.",
      "orientation": "Varies during manipulation.",
      "pose": "Held, then hanging.",
      "action": "Being moved and hung.",
      "state_changes": "Transitions from being held to hanging on a hook."
    },
    {
      "description": "Coat rack.",
      "appearance_details": "Black pole, silver circular base, black hooks.",
      "relationship": "Target for the cap.",
      "location": "Center of the frame.",
      "relative_size": "Large.",
      "orientation": "Vertical.",
      "pose": "Stationary.",
      "action": "None.",
      "state_changes": "None."
    }
  ],
  "background_setting": "An indoor room with light wood flooring, a grey wall, a beige curtain, and a grey upholstered chair.",
  "lighting": {
    "conditions": "Indoor artificial lighting.",
    "direction": "Overhead.",
    "shadows": "Soft shadows cast by the robot and coat rack.",
    "illumination_effect": "Even illumination across the scene."
  },
  "aesthetics": {
    "composition": "Centered on the coat rack with robotic arms framing the action.",
    "color_scheme": "Neutral tones with black robots and brown cap.",
    "mood_atmosphere": "Functional and technical.",
    "patterns": "Minimal patterns, mostly solid colors."
  },
  "cinematography": {
    "framing": "Medium shot.",
    "camera_angle": "High angle looking down.",
    "depth_of_field": "Deep focus.",
    "focus": "Sharp on the robot and objects.",
    "lens_focal_length": "Unknown.",
    "camera_motion": "Static."
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "The inferred task is for the robot to hang a cap on a coat rack, demonstrating object manipulation and spatial awareness.",
  "actions": [
    {
      "time": "00:00 - 00:02",
      "description": "The left robotic gripper holds the brown cap near the base of the coat rack.",
      "actor": "Left robotic gripper.",
      "object": "Brown cap.",
      "contact_event": "Gripper holding cap.",
      "state_change": "Cap is stationary in the gripper.",
      "evidence": "Visual confirmation of the gripper holding the cap."
    },
    {
      "time": "00:02 - 00:06",
      "description": "The left robotic gripper lifts the cap upwards towards the hook.",
      "actor": "Left robotic gripper.",
      "object": "Brown cap.",
      "contact_event": "Gripper holding cap.",
      "state_change": "Cap moves upwards.",
      "evidence": "Visual confirmation of the cap's upward movement."
    },
    {
      "time": "00:06 - 00:09",
      "description": "The left robotic gripper maneuvers the cap onto the hook.",
      "actor": "Left robotic gripper.",
      "object": "Brown cap.",
      "contact_event": "Gripper placing cap on hook.",
      "state_change": "Cap is positioned on the hook.",
      "evidence": "Visual confirmation of the cap being placed on the hook."
    },
    {
      "time": "00:09 - 00:11",
      "description": "The left robotic gripper releases the cap and retracts.",
      "actor": "Left robotic gripper.",
      "object": "Brown cap.",
      "contact_event": "Gripper releasing cap.",
      "state_change": "Cap hangs freely on the hook.",
      "evidence": "Visual confirmation of the gripper releasing the cap."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:02",
      "description": "Initial state with the cap held by the left gripper.",
      "key_changes": "None.",
      "camera": "Static.",
      "robot_motion": "Minimal.",
      "object_motion": "None.",
      "physical_logic": "Stable grasp."
    },
    {
      "segment_index": 2,
      "time_range": "00:02 - 00:06",
      "description": "Lifting the cap towards the hook.",
      "key_changes": "Cap moves upwards.",
      "camera": "Static.",
      "robot_motion": "Upward movement.",
      "object_motion": "Upward movement.",
      "physical_logic": "Controlled lift."
    },
    {
      "segment_index": 3,
      "time_range": "00:06 - 00:09",
      "description": "Placing the cap on the hook.",
      "key_changes": "Cap is positioned on the hook.",
      "camera": "Static.",
      "robot_motion": "Precise placement.",
      "object_motion": "Placement on hook.",
      "physical_logic": "Successful hanging."
    },
    {
      "segment_index": 4,
      "time_range": "00:09 - 00:11",
      "description": "Releasing the cap and retracting.",
      "key_changes": "Cap hangs freely.",
      "camera": "Static.",
      "robot_motion": "Retraction.",
      "object_motion": "None.",
      "physical_logic": "Release and stability."
    }
  ],
  "transitions": [
    "Lifting the cap.",
    "Placing the cap on the hook.",
    "Releasing the cap."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "11.633s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Dual robotic arms with multi-fingered grippers.",
    "end_effector": "Multi-fingered grippers.",
    "manipulated_objects": "Brown fabric cap.",
    "task_goal": "Hang the cap on the coat rack.",
    "initial_state": "Cap held by the left gripper.",
    "terminal_state": "Cap hanging on the coat rack.",
    "affordances": "Gripping, lifting, placing.",
    "failure_modes": "Dropping the cap, missing the hook."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper maintains a firm grasp on the cap during movement.",
    "support_and_stability": "The coat rack provides a stable support for the cap.",
    "force_gravity_friction": "Gravity acts on the cap, but the gripper counteracts it during movement.",
    "causal_chain": "Lifting leads to placement, which leads to hanging.",
    "object_permanence": "The cap remains visible throughout the sequence.",
    "next_action_prediction": "The robot may move to another task or return to a home position."
  },
  "uncertainties": [
    "The exact model of the robotic arms is not specified."
  ],
  "comprehensive_t2v_caption": "In a high-angle static shot, a dual-armed robot performs a manipulation task in a domestic setting. The left robotic gripper, black and multi-fingered, initially holds a brown fabric cap near the base of a black coat rack with a silver base. The right gripper remains idle. The left gripper lifts the cap upwards, maneuvering it towards a hook on the coat rack. With precise movements, the gripper places the cap onto the hook and releases it, allowing the cap to hang freely. The scene is evenly lit, highlighting the neutral tones of the room, which includes a beige curtain and a grey upholstered chair. The video concludes with the cap successfully hung on the coat rack, demonstrating the robot's capability in object manipulation and spatial awareness."
}

Cosmos 3 Nano (16B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Nano (16B) · 14.9s · 2026-06-10T18:36:53+0000

Short Read

The robot arm holds a brown cloth bag and moves it towards a coat rack. The bag is then hung on the coat rack, and the robot arm releases it. The bag hangs freely from the rack for the remainder of the video.

Training Caption Field

A robot arm with a mechanical gripper manipulates a brown cloth bag with a red tag, moving it towards a coat rack. The robot arm successfully hangs the bag on the coat rack and then releases it, moving away from the rack. The scene takes place indoors, with a chair, a white box, and a curtain visible in the background. The lighting is bright and even, and the camera angle is low, focusing on the interaction between the robot arm and the bag.

Full output JSON/text
{
  "scene_imagination": "A robot arm is manipulating a brown cloth bag near a coat rack in an indoor setting. The scene includes a chair, a white box, and a curtain in the background. The robot arm moves the bag, causing it to hang from the coat rack. The lighting is even, and the camera angle is low, focusing on the interaction between the robot and the bag.",
  "temporal_caption": "The robot arm holds a brown cloth bag and moves it towards a coat rack. The bag is then hung on the coat rack, and the robot arm releases it. The bag hangs freely from the rack for the remainder of the video.",
  "audio_description": "The sound of the robot arm moving and the bag being hung on the coat rack.",
  "subjects": [
    {
      "description": "A robot arm with a mechanical gripper holding a brown cloth bag.",
      "appearance_details": "The robot arm is black with metallic joints and a mechanical gripper. The brown cloth bag has a red tag attached to it.",
      "relationship": "The robot arm is manipulating the brown cloth bag.",
      "location": "Left side of the frame.",
      "relative_size": "Large within the frame.",
      "orientation": "Facing the coat rack.",
      "pose": "The robot arm is extended towards the coat rack.",
      "action": "The robot arm holds the brown cloth bag and moves it towards the coat rack.",
      "state_changes": "The robot arm moves the bag towards the coat rack and then releases it."
    },
    {
      "description": "A brown cloth bag with a red tag.",
      "appearance_details": "The bag is made of a brown fabric and has a red tag attached to it.",
      "relationship": "The bag is being manipulated by the robot arm.",
      "location": "Held by the robot arm on the left side of the frame.",
      "relative_size": "Medium within the frame.",
      "orientation": "Facing the coat rack.",
      "pose": "The bag is held by the robot arm and then hung on the coat rack.",
      "action": "The bag is moved towards the coat rack and then hung on it.",
      "state_changes": "The bag is moved from the robot arm to the coat rack."
    },
    {
      "description": "A coat rack with a black pole and a circular base.",
      "appearance_details": "The coat rack has a black pole and a circular base. It is positioned near the curtain.",
      "relationship": "The coat rack is being used to hang the brown cloth bag.",
      "location": "Center of the frame.",
      "relative_size": "Medium within the frame.",
      "orientation": "Facing the robot arm.",
      "pose": "The coat rack is stationary.",
      "action": "The coat rack is used to hang the brown cloth bag.",
      "state_changes": "The coat rack now has the brown cloth bag hanging on it."
    },
    {
      "description": "A chair with a beige cushion.",
      "appearance_details": "The chair has a beige cushion and is positioned near the coat rack.",
      "relationship": "The chair is in the background.",
      "location": "Right side of the frame.",
      "relative_size": "Medium within the frame.",
      "orientation": "Facing the coat rack.",
      "pose": "The chair is stationary.",
      "action": "The chair remains stationary.",
      "state_changes": "No significant change."
    },
    {
      "description": "A white box.",
      "appearance_details": "The white box is positioned near the chair.",
      "relationship": "The white box is in the background.",
      "location": "Right side of the frame.",
      "relative_size": "Small within the frame.",
      "orientation": "Facing the camera.",
      "pose": "The white box is stationary.",
      "action": "The white box remains stationary.",
      "state_changes": "No significant change."
    }
  ],
  "background_setting": "The scene takes place indoors, with a coat rack, a chair, a curtain, and a white box visible in the background. The floor is made of wood, and the walls are painted in a light color.",
  "lighting": {
    "conditions": "Bright indoor lighting.",
    "direction": "Top-lit.",
    "shadows": "Soft shadows on the floor and walls.",
    "illumination_effect": "Even illumination across the scene."
  },
  "aesthetics": {
    "composition": "The scene is composed with the robot arm and the coat rack in the foreground, and the chair and white box in the background.",
    "color_scheme": "The scene features a mix of neutral colors, including beige, black, and white, with the brown cloth bag providing a pop of color.",
    "mood_atmosphere": "The scene has a calm and focused atmosphere, with the robot arm performing a precise task.",
    "patterns": "The floor has a wood grain pattern."
  },
  "cinematography": {
    "camera_angle": "Low angle.",
    "framing": "Medium shot.",
    "depth_of_field": "Deep.",
    "focus": "Sharp focus on the robot arm and the coat rack.",
    "lens_focal_length": "Wide-angle.",
    "camera_motion": "Static."
  },
  "style_medium": "Robotics video.",
  "artistic_style": "Real-world robot manipulation footage.",
  "context": "The video shows a robot arm manipulating a brown cloth bag and hanging it on a coat rack.",
  "actions": [
    {
      "time": "0:00-0:05",
      "description": "The robot arm holds the brown cloth bag and moves it towards the coat rack.",
      "actor": "Robot arm.",
      "object": "Brown cloth bag.",
      "contact_event": "The robot arm moves the bag towards the coat rack.",
      "state_change": "The bag is moved from the robot arm to the coat rack."
    },
    {
      "time": "0:05-0:10",
      "description": "The robot arm releases the brown cloth bag, which hangs on the coat rack.",
      "actor": "Robot arm.",
      "object": "Brown cloth bag.",
      "contact_event": "The robot arm releases the bag.",
      "state_change": "The bag is now hanging on the coat rack."
    },
    {
      "time": "0:10-0:11",
      "description": "The robot arm moves away from the coat rack.",
      "actor": "Robot arm.",
      "object": "None.",
      "contact_event": "The robot arm moves away.",
      "state_change": "The robot arm is no longer in contact with the bag."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0:00-0:05",
      "description": "The robot arm holds the brown cloth bag and moves it towards the coat rack.",
      "key_changes": "The bag is moved from the robot arm to the coat rack.",
      "camera": "Static low angle shot.",
      "robot_motion": "The robot arm moves the bag towards the coat rack.",
      "object_motion": "The bag is moved from the robot arm to the coat rack.",
      "physical_reasoning": "The robot arm uses its gripper to hold and move the bag."
    },
    {
      "segment_index": 1,
      "time_range": "0:05-0:10",
      "description": "The robot arm releases the brown cloth bag, which hangs on the coat rack.",
      "key_changes": "The bag is now hanging on the coat rack.",
      "camera": "Static low angle shot.",
      "robot_motion": "The robot arm releases the bag.",
      "object_motion": "The bag hangs on the coat rack.",
      "physical_reasoning": "The robot arm releases the bag, allowing it to hang on the coat rack."
    },
    {
      "segment_index": 2,
      "time_range": "0:10-0:11",
      "description": "The robot arm moves away from the coat rack.",
      "key_changes": "The robot arm is no longer in contact with the bag.",
      "camera": "Static low angle shot.",
      "robot_motion": "The robot arm moves away.",
      "object_motion": "The robot arm moves away.",
      "physical_reasoning": "The robot arm completes its task and moves away."
    }
  ],
  "transitions": [],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "11.633s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Manipulator robot arm.",
    "end_effector": "Mechanical gripper.",
    "manipulated_objects": "Brown cloth bag.",
    "task_goal": "Hang the brown cloth bag on the coat rack.",
    "initial_state": "The robot arm holds the brown cloth bag.",
    "terminal_state": "The brown cloth bag hangs on the coat rack.",
    "affordances": "The coat rack provides a hook for hanging the bag.",
    "failure_modes": "The robot arm may fail to grip the bag or may drop it."
  },
  "physical_reasoning": "The robot arm uses its gripper to hold and move the bag. The bag is then hung on the coat rack, which provides a hook for hanging it. The robot arm completes its task and moves away.",
  "uncertainties": [],
  "comprehensive_t2v_caption": "A robot arm with a mechanical gripper manipulates a brown cloth bag with a red tag, moving it towards a coat rack. The robot arm successfully hangs the bag on the coat rack and then releases it, moving away from the rack. The scene takes place indoors, with a chair, a white box, and a curtain visible in the background. The lighting is bright and even, and the camera angle is low, focusing on the interaction between the robot arm and the bag."
}

Cosmos 3 Super (64B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Super (64B) · 19.9s · 2026-06-10T16:13:24+0000

Short Read

The robot arm grasps the brown cap, lifts it upwards, and places it onto the hook of the coat rack.

Training Caption Field

A black robot arm with a multi-jointed gripper grasps a brown baseball cap from a lower position, lifts it upwards, and places it onto the hook of a tall, slender black coat rack. The coat rack is positioned next to a beige curtain and a light brown upholstered chair. The floor is light wood laminate, and the walls are a muted gray. The robot arm is partially visible on the right side of the frame, with its base and some mechanical components showing. The cap is a classic baseball style with a curved brim and a red tag visible on the inside. The robot's gripper carefully grasps the cap, lifts it, and positions it onto the hook of the coat rack.

Full output JSON/text
{
  "scene_imagination": "A robot arm with a black gripper manipulates a brown cap, moving it from a lower position to hang on a coat rack hook. The coat rack is a tall, slender black pole with a circular base, positioned next to a beige curtain and a light brown upholstered chair. The floor is light wood laminate, and the walls are a muted gray. The robot arm is partially visible on the right side of the frame, with its base and some mechanical components showing. The cap is a classic baseball style with a curved brim and a red tag visible on the inside. The robot's gripper carefully grasps the cap, lifts it, and positions it onto the hook of the coat rack.",
  "temporal_caption": "The robot arm grasps the brown cap, lifts it upwards, and places it onto the hook of the coat rack.",
  "audio_description": "Quiet ambient room tone with subtle mechanical clicks and soft fabric rustling as the robot arm moves the cap.",
  "subjects": [
    {
      "description": "A black robot arm with a multi-jointed gripper, partially visible on the right side of the frame.",
      "appearance_details": "The arm is black with metallic joints and a logo on the base. The gripper has multiple fingers for grasping.",
      "relationship": "Interacting with the brown cap.",
      "location": "Right side of the frame, extending towards the center.",
      "relative_size": "Large within frame",
      "orientation": "Facing left towards the coat rack.",
      "pose": "Extended towards the coat rack.",
      "action": "Grasping, lifting, and placing the cap onto the coat rack hook.",
      "state_changes": "The arm moves from a lower position to a higher position, then retracts slightly after placing the cap."
    },
    {
      "description": "A brown baseball cap with a curved brim and a red tag on the inside.",
      "appearance_details": "The cap is made of a textured fabric, likely suede or a similar material. It has a curved brim and a red tag visible on the inside.",
      "relationship": "Being manipulated by the robot arm.",
      "location": "Center of the frame, moving from lower left to upper center.",
      "relative_size": "Medium within frame",
      "orientation": "Initially facing down, then facing forward as it is hung.",
      "pose": "Held by the robot arm, then hanging on the coat rack hook.",
      "action": "Being lifted and placed onto the coat rack hook.",
      "state_changes": "Moves from a lower position to a higher position, then hangs stationary on the hook."
    },
    {
      "description": "A tall, slender black coat rack with a circular base and a hook at the top.",
      "appearance_details": "The coat rack is made of black metal with a circular base and a hook at the top. It is positioned next to a beige curtain.",
      "relationship": "Receiving the brown cap from the robot arm.",
      "location": "Center of the frame, slightly to the left.",
      "relative_size": "Large within frame",
      "orientation": "Vertical.",
      "pose": "Standing upright.",
      "action": "Holding the brown cap on its hook.",
      "state_changes": "No significant change."
    }
  ],
  "background_setting": "An indoor room with light wood laminate flooring, a muted gray wall, and a beige curtain. A light brown upholstered chair is visible on the right side of the frame.",
  "lighting": {
    "conditions": "Bright indoor lighting.",
    "direction": "Top-lit, casting soft shadows downwards.",
    "shadows": "Soft shadows on the floor and wall.",
    "illumination_effect": "Even illumination across the scene, highlighting the textures of the cap and the metallic surfaces of the robot arm."
  },
  "aesthetics": {
    "composition": "Centered on the interaction between the robot arm and the coat rack.",
    "color_scheme": "Neutral tones with black, brown, beige, and gray dominating.",
    "mood_atmosphere": "Functional and focused on the task.",
    "patterns": "Linear patterns on the wood flooring and the vertical lines of the curtain."
  },
  "cinematography": {
    "camera_motion": "Static.",
    "framing": "Medium shot.",
    "camera_angle": "High angle, looking down at the scene.",
    "depth_of_field": "Deep focus, keeping both the foreground and background sharp.",
    "focus": "Sharp focus on the robot arm, cap, and coat rack.",
    "lens_focal_length": "Wide-angle."
  },
  "style_medium": "Real-world robot manipulation footage.",
  "artistic_style": "Documentary-style.",
  "context": "A demonstration of a robot arm performing a task, likely for research or development purposes.",
  "actions": [
    {
      "time": "0-3s",
      "description": "The robot arm grasps the brown cap from a lower position.",
      "actor": "Robot arm",
      "object": "Brown cap",
      "contact_event": "The gripper closes around the cap.",
      "state_change": "The cap is lifted from the lower position."
    },
    {
      "time": "3-7s",
      "description": "The robot arm lifts the cap upwards towards the coat rack hook.",
      "actor": "Robot arm",
      "object": "Brown cap",
      "contact_event": "The cap moves closer to the hook.",
      "state_change": "The cap is raised to the height of the hook."
    },
    {
      "time": "7-11s",
      "description": "The robot arm places the cap onto the coat rack hook and releases it.",
      "actor": "Robot arm",
      "object": "Brown cap",
      "contact_event": "The cap is placed on the hook and the gripper opens.",
      "state_change": "The cap hangs stationary on the hook."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Logo on the robot arm base",
      "category": "physical_in_scene",
      "appearance": "Small, metallic logo on the black base of the robot arm.",
      "spatial": "Bottom right of the frame.",
      "temporal": "Static throughout the video.",
      "context": "Brand or manufacturer logo on the robot arm."
    }
  ],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0-3s",
      "description": "The robot arm grasps the brown cap from a lower position.",
      "key_changes": "The cap is lifted from the lower position.",
      "camera": "Static high angle shot.",
      "robot_motion": "The robot arm moves upwards to grasp the cap.",
      "object_motion": "The cap is lifted upwards.",
      "physical_logic": "The robot arm uses its gripper to grasp the cap."
    },
    {
      "segment_index": 1,
      "time_range": "3-7s",
      "description": "The robot arm lifts the cap upwards towards the coat rack hook.",
      "key_changes": "The cap is raised to the height of the hook.",
      "camera": "Static high angle shot.",
      "robot_motion": "The robot arm continues to move upwards.",
      "object_motion": "The cap moves upwards.",
      "physical_logic": "The robot arm lifts the cap to the correct height."
    },
    {
      "segment_index": 2,
      "time_range": "7-11s",
      "description": "The robot arm places the cap onto the coat rack hook and releases it.",
      "key_changes": "The cap is placed on the hook and hangs stationary.",
      "camera": "Static high angle shot.",
      "robot_motion": "The robot arm moves slightly forward and releases the cap.",
      "object_motion": "The cap is placed on the hook and hangs stationary.",
      "physical_logic": "The robot arm places the cap on the hook and releases it."
    }
  ],
  "transitions": [
    "No significant transitions."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16,9",
  "duration": "11.633s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "A black robot arm with a multi-jointed gripper.",
    "end_effector": "A multi-fingered gripper designed for grasping objects.",
    "manipulated_objects": [
      "A brown baseball cap."
    ],
    "task_goal": "To hang the brown cap on the coat rack hook.",
    "initial_state": "The robot arm is positioned near the cap, which is at a lower height.",
    "terminal_state": "The cap is hanging on the coat rack hook.",
    "affordances": "The robot arm can grasp, lift, and place objects.",
    "failure_modes": "The robot arm might drop the cap or fail to grasp it properly."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robot arm uses its gripper to grasp the cap firmly.",
    "support_and_stability": "The robot arm provides support to lift and place the cap.",
    "force_gravity_friction": "The robot arm must overcome gravity to lift the cap and ensure it stays on the hook.",
    "causal_chain": "The robot arm grasps the cap, lifts it, places it on the hook, and releases it.",
    "object_permanence": "The cap remains in the scene throughout the video.",
    "next_action_prediction": "The robot arm will likely retract or move to a resting position after placing the cap."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A black robot arm with a multi-jointed gripper grasps a brown baseball cap from a lower position, lifts it upwards, and places it onto the hook of a tall, slender black coat rack. The coat rack is positioned next to a beige curtain and a light brown upholstered chair. The floor is light wood laminate, and the walls are a muted gray. The robot arm is partially visible on the right side of the frame, with its base and some mechanical components showing. The cap is a classic baseball style with a curved brim and a red tag visible on the inside. The robot's gripper carefully grasps the cap, lifts it, and positions it onto the hook of the coat rack."
}

Cosmos 3 Super (64B) / p09: Reproduction schema

ok
p09: Reproduction schema · Cosmos 3 Super (64B) · 15.7s · 2026-06-10T16:13:39+0000

Short Read

A robot arm manipulates a brown bag, lifting and repositioning it on a black stand near a beige curtain and a chair.

Training Caption Field

A robot arm manipulates a brown bag, lifting and repositioning it on a black stand near a beige curtain and a chair.

Full output JSON/text
{
  "schema_version": "p09_reproduction_v1",
  "one_sentence_summary": "A robot arm manipulates a brown bag, lifting and repositioning it on a black stand near a beige curtain and a chair.",
  "video_metadata": {
    "resolution": {
      "W": 1152,
      "H": 816
    },
    "aspect_ratio": "16,9",
    "duration_sec": 11.633,
    "fps": 30.0,
    "frame_count": 349
  },
  "camera": {
    "viewpoint": "high angle",
    "framing": "medium shot",
    "camera_angle": "high angle",
    "camera_motion": "static",
    "estimated_height_or_distance": "1.5 meters",
    "lens_or_distortion": "none",
    "crop_boundaries": "none",
    "stability": "stable"
  },
  "scene_layout": {
    "coordinate_frame": "camera-centered",
    "background": "indoor room with a beige curtain, a chair, and a wooden floor",
    "surface": "wooden floor",
    "lighting": "even indoor lighting",
    "global_spatial_map": "robot arm on the left, black stand with a brown bag in the center, chair on the right",
    "occlusions": "none"
  },
  "subjects": [
    {
      "id": "robot_arm",
      "category": "robot arm",
      "appearance": "black mechanical arm with multiple joints and a gripper",
      "visible_parts": "arm, joints, gripper",
      "initial_pose": "holding a brown bag",
      "final_pose": "releasing the brown bag",
      "workspace_location": "left side of the frame",
      "motion_path": "lifting and repositioning the brown bag",
      "end_effector": "gripper",
      "uncertainties": "none"
    }
  ],
  "objects": [
    {
      "id": "brown_bag",
      "category": "bag",
      "appearance": "brown fabric bag with a red tag",
      "label_or_text": "none",
      "material": "fabric",
      "shape": "irregular",
      "size_relative_to_frame": "medium",
      "initial_pose": "held by the robot arm",
      "final_pose": "hanging on the black stand",
      "state_changes": "lifted and repositioned by the robot arm",
      "uncertainties": "none"
    },
    {
      "id": "black_stand",
      "category": "stand",
      "appearance": "black metal stand with a circular base",
      "label_or_text": "none",
      "material": "metal",
      "shape": "cylindrical",
      "size_relative_to_frame": "medium",
      "initial_pose": "standing upright",
      "final_pose": "standing upright",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "chair",
      "category": "chair",
      "appearance": "beige upholstered chair with black legs",
      "label_or_text": "none",
      "material": "fabric",
      "shape": "rectangular",
      "size_relative_to_frame": "medium",
      "initial_pose": "stationary",
      "final_pose": "stationary",
      "state_changes": "none",
      "uncertainties": "none"
    }
  ],
  "initial_state": "The robot arm is holding a brown bag near a black stand.",
  "final_state": "The robot arm has released the brown bag, which is now hanging on the black stand.",
  "timeline": [
    {
      "time_range": "0-5s",
      "description": "The robot arm lifts the brown bag.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "brown_bag": "being lifted"
      },
      "robot_motion": "lifting",
      "object_motion": "brown bag being lifted",
      "gripper_state": "closed",
      "contact_state": "robot arm gripping the brown bag"
    },
    {
      "time_range": "5-10s",
      "description": "The robot arm repositions the brown bag on the black stand.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "brown_bag": "being repositioned"
      },
      "robot_motion": "repositioning",
      "object_motion": "brown bag being repositioned",
      "gripper_state": "closed",
      "contact_state": "robot arm gripping the brown bag"
    },
    {
      "time_range": "10-11s",
      "description": "The robot arm releases the brown bag.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "brown_bag": "hanging on the black stand"
      },
      "robot_motion": "releasing",
      "object_motion": "brown bag hanging on the black stand",
      "gripper_state": "open",
      "contact_state": "robot arm no longer gripping the brown bag"
    }
  ],
  "parallel_action_graph": {
    "simultaneous_actions": [],
    "dependencies": [],
    "ordering_constraints": [
      "lifting before repositioning",
      "repositioning before releasing"
    ]
  },
  "keyframes": [
    {
      "time": 0,
      "frame_description": "The robot arm is holding a brown bag near a black stand.",
      "subject_poses": {
        "robot_arm": "holding the brown bag"
      },
      "object_poses": {
        "brown_bag": "held by the robot arm"
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "brown_bag",
          "contact_type": "gripping"
        }
      ]
    },
    {
      "time": 5,
      "frame_description": "The robot arm is lifting the brown bag.",
      "subject_poses": {
        "robot_arm": "lifting the brown bag"
      },
      "object_poses": {
        "brown_bag": "being lifted"
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "brown_bag",
          "contact_type": "gripping"
        }
      ]
    },
    {
      "time": 10,
      "frame_description": "The robot arm is repositioning the brown bag on the black stand.",
      "subject_poses": {
        "robot_arm": "repositioning the brown bag"
      },
      "object_poses": {
        "brown_bag": "being repositioned"
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "brown_bag",
          "contact_type": "gripping"
        }
      ]
    },
    {
      "time": 11,
      "frame_description": "The robot arm has released the brown bag, which is now hanging on the black stand.",
      "subject_poses": {
        "robot_arm": "releasing the brown bag"
      },
      "object_poses": {
        "brown_bag": "hanging on the black stand"
      },
      "contacts": []
    }
  ],
  "contact_events": [
    {
      "time": 0,
      "subject_id": "robot_arm",
      "object_id": "brown_bag",
      "contact_type": "gripping",
      "contact_location": "gripper",
      "grasp_or_support_quality": "secure",
      "release_or_transfer": "none",
      "visible_evidence": "robot arm gripping the brown bag"
    },
    {
      "time": 11,
      "subject_id": "robot_arm",
      "object_id": "brown_bag",
      "contact_type": "releasing",
      "contact_location": "gripper",
      "grasp_or_support_quality": "none",
      "release_or_transfer": "none",
      "visible_evidence": "robot arm no longer gripping the brown bag"
    }
  ],
  "physics_and_affordances": {
    "support_surfaces": [
      "wooden floor",
      "black stand"
    ],
    "gravity_and_stability": "stable",
    "force_or_constraint_reasoning": "none",
    "collision_or_near_collision": "none",
    "object_permanence": "none",
    "task_goal": "manipulate the brown bag",
    "failure_modes": "none",
    "next_action_prediction": "none"
  },
  "appearance_fidelity": {
    "colors": {
      "robot_arm": "black",
      "brown_bag": "brown",
      "black_stand": "black",
      "chair": "beige"
    },
    "textures": {
      "robot_arm": "metal",
      "brown_bag": "fabric",
      "black_stand": "metal",
      "chair": "fabric"
    },
    "lighting_and_shadows": "even indoor lighting",
    "robot_morphology": "black mechanical arm with multiple joints and a gripper",
    "object_fine_details": {
      "brown_bag": "brown fabric bag with a red tag",
      "black_stand": "black metal stand with a circular base",
      "chair": "beige upholstered chair with black legs"
    }
  },
  "text_and_labels": [
    {
      "object_id": "brown_bag",
      "raw_text": "none",
      "legibility": "none",
      "appearance": "none",
      "spatial_location": "none",
      "time_visible": "none"
    }
  ],
  "audio_description": "none",
  "generation_controls": {
    "shot_type": "medium shot",
    "camera_motion_instruction": "static",
    "motion_style": "none",
    "duration_instruction": "11.633s",
    "spatial_consistency_requirements": "none",
    "negative_constraints": "none",
    "recommended_caption_style": "none"
  },
  "uncertainties": [],
  "comprehensive_reproduction_caption": "A robot arm manipulates a brown bag, lifting and repositioning it on a black stand near a beige curtain and a chair."
}

rovid_007

top

Qwen 3.5 9B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 9B · 55.8s · 2026-06-10T17:49:07+0000

Short Read

At the beginning, a robotic arm hovers over a box of water bottles. In the middle, the arm descends and grasps a bottle from the back row of the box. By the end, the robot lifts the bottle vertically, holding it in the air.

Training Caption Field

A high-angle static shot shows a robotic arm with a black gripper interacting with a cardboard box full of water bottles on a white table. Initially, the arm hovers over the box. It then moves closer to a specific bottle in the back row, grasps it securely, and lifts it vertically into the air, leaving the rest of the bottles undisturbed in the box.

Full output JSON/text
{
  "scene_imagination": "A high-angle, static shot captures a white tabletop workspace. A cardboard box filled with water bottles sits in the foreground. Behind the box, four empty water bottles are lined up on the table. A robotic arm with a black gripper enters from the right, hovering over the box.",
  "temporal_caption": "At the beginning, a robotic arm hovers over a box of water bottles. In the middle, the arm descends and grasps a bottle from the back row of the box. By the end, the robot lifts the bottle vertically, holding it in the air.",
  "audio_description": null,
  "subjects": [
    {
      "description": "Robotic arm with black gripper",
      "appearance_details": "White arm segments, black articulated gripper, visible cables and joints",
      "relationship": "Manipulator",
      "location": "Right side of the frame",
      "relative_size": "Large compared to bottles",
      "orientation": "Angled towards the box",
      "pose": "Hovering, then grasping",
      "action": "Grasping and lifting a bottle",
      "state_changes": "Moves from hover to grasp to lift"
    },
    {
      "description": "Water bottles",
      "appearance_details": "Clear plastic, white caps, some with red labels",
      "relationship": "Objects being manipulated",
      "location": "Inside the cardboard box and on the table",
      "relative_size": "Standard size",
      "orientation": "Upright",
      "pose": "Static until one is moved",
      "action": "Being picked up",
      "state_changes": "One bottle moves from inside the box to being held in the air"
    }
  ],
  "background_setting": "A clean, white laboratory or testing environment with a seamless white table surface.",
  "lighting": {
    "conditions": "Bright, even indoor lighting",
    "direction": "Overhead",
    "shadows": "Soft shadows cast by the robot and bottles",
    "illumination_effect": "Clear visibility of object details"
  },
  "aesthetics": {
    "composition": "Centered on the box and robot interaction",
    "color_scheme": "Monochromatic white and grey with black accents and clear plastic",
    "mood_atmosphere": "Clinical, precise, industrial",
    "patterns": "Grid-like arrangement of bottles"
  },
  "cinematography": {
    "framing": "Medium close-up",
    "camera_angle": "High angle, looking down",
    "depth_of_field": "Deep focus",
    "focus": "Sharp on the robot and bottles",
    "lens_focal_length": "Unknown",
    "camera_motion": "Static"
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "A demonstration of robotic manipulation capabilities, specifically object grasping and lifting.",
  "actions": [
    {
      "time": "00:00 - 00:05",
      "description": "The robotic arm hovers over the cardboard box.",
      "actor": "Robotic arm",
      "object": "Box of bottles",
      "contact_event": null,
      "state_change": null,
      "evidence": "Arm remains stationary above the box"
    },
    {
      "time": "00:05 - 00:09",
      "description": "The robotic arm approaches a specific bottle in the back row of the box.",
      "actor": "Robotic arm",
      "object": "Bottle in box",
      "contact_event": null,
      "state_change": null,
      "evidence": "Arm moves closer to the target bottle"
    },
    {
      "time": "00:09 - 00:11",
      "description": "The robotic arm grasps the bottle.",
      "actor": "Robotic arm",
      "object": "Bottle in box",
      "contact_event": "Gripper closes around bottle",
      "state_change": "Bottle is secured in gripper",
      "evidence": "Gripper fingers wrap around the bottle neck"
    },
    {
      "time": "00:11 - 00:12",
      "description": "The robotic arm lifts the bottle out of the box.",
      "actor": "Robotic arm",
      "object": "Bottle",
      "contact_event": "Lifting force applied",
      "state_change": "Bottle moves from inside box to air",
      "evidence": "Bottle is suspended in the air"
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:05",
      "description": "The robot arm hovers over the box.",
      "key_changes": "None",
      "camera": "Static",
      "robot_motion": "Hovering",
      "object_motion": "None",
      "physical_logic": "Pre-grasp positioning"
    },
    {
      "segment_index": 2,
      "time_range": "00:05 - 00:09",
      "description": "The robot arm approaches the target bottle.",
      "key_changes": "Arm moves closer",
      "camera": "Static",
      "robot_motion": "Translation towards object",
      "object_motion": "None",
      "physical_logic": "Approach vector alignment"
    },
    {
      "segment_index": 3,
      "time_range": "00:09 - 00:12",
      "description": "The robot grasps and lifts the bottle.",
      "key_changes": "Bottle is lifted",
      "camera": "Static",
      "robot_motion": "Lifting motion",
      "object_motion": "Vertical translation",
      "physical_logic": "Grasp and lift sequence"
    }
  ],
  "transitions": [
    "Hover to approach",
    "Approach to grasp",
    "Grasp to lift"
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "12.167s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Industrial robotic arm",
    "end_effector": "Black parallel gripper",
    "manipulated_objects": "Water bottles",
    "task_goal": "Pick and place (or lift) a bottle",
    "initial_state": "Arm hovering over box",
    "terminal_state": "Arm holding bottle in air",
    "affordances": "Gripper designed for cylindrical objects",
    "failure_modes": "None observed"
  },
  "physical_reasoning": {
    "contact_and_grasp": "Gripper successfully closes around the bottle neck",
    "support_and_stability": "Bottle is stable in the gripper",
    "force_gravity_friction": "Lifting force overcomes gravity",
    "causal_chain": "Approach -> Grasp -> Lift",
    "object_permanence": "Bottle remains intact",
    "next_action_prediction": "Place bottle on the table or in another location"
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A high-angle static shot shows a robotic arm with a black gripper interacting with a cardboard box full of water bottles on a white table. Initially, the arm hovers over the box. It then moves closer to a specific bottle in the back row, grasps it securely, and lifts it vertically into the air, leaving the rest of the bottles undisturbed in the box."
}

Qwen 3.5 27B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 27B · 172.4s · 2026-06-10T18:17:10+0000

Short Read

At the beginning, the scene is static with two robotic arms positioned around a box of bottles. In the early phase, the right robotic arm moves its gripper towards the box. In the middle phase, the gripper descends and precisely grasps one of the bottles from the top right corner of the box. Towards the end, the arm successfully lifts the bottle out of the box and holds it in the air.

Training Caption Field

A high-angle, static shot captures a robotic manipulation task on a white tabletop. A brown cardboard box filled with plastic bottles containing a pinkish liquid sits in the foreground. A white and black robotic arm enters from the top right, its black multi-fingered gripper moving towards the box. The arm precisely positions itself over a bottle in the top right corner of the box. The gripper then closes, securely grasping the bottle. With a smooth upward motion, the robotic arm lifts the bottle out of the box, holding it in the air. A second, stationary black gripper is visible on the left side of the frame. The scene is brightly lit, emphasizing the clean, functional environment of the robotic workspace.

Full output JSON/text
{
  "scene_imagination": "A high-angle, static shot captures a white tabletop workspace. In the foreground, a brown cardboard box is filled with numerous plastic bottles containing a pinkish liquid. Behind the box, four clear plastic water bottles are lined up. Two robotic arms are present: a black gripper on the left remains stationary, while a white and black robotic arm on the right is active. The lighting is bright and even, casting soft shadows.",
  "temporal_caption": "At the beginning, the scene is static with two robotic arms positioned around a box of bottles. In the early phase, the right robotic arm moves its gripper towards the box. In the middle phase, the gripper descends and precisely grasps one of the bottles from the top right corner of the box. Towards the end, the arm successfully lifts the bottle out of the box and holds it in the air.",
  "audio_description": null,
  "subjects": [
    {
      "description": "A white and black robotic arm with a black multi-fingered gripper.",
      "appearance_details": "The arm has white segments and black joints, with a complex black gripper at the end.",
      "relationship": "The active manipulator in the scene.",
      "location": "Enters from the top right of the frame.",
      "relative_size": "Large, dominating the right side of the frame.",
      "orientation": "Oriented downwards towards the box.",
      "pose": "Initially open, then closing to grasp an object.",
      "action": "Moves to grasp and lift a bottle.",
      "state_changes": "Transitions from an idle state to an active manipulation state."
    },
    {
      "description": "A black robotic gripper.",
      "appearance_details": "A stationary black gripper attached to a base on the left.",
      "relationship": "Passive observer in the scene.",
      "location": "Bottom left of the frame.",
      "relative_size": "Medium.",
      "orientation": "Facing towards the center.",
      "pose": "Stationary and open.",
      "action": "None.",
      "state_changes": "None."
    },
    {
      "description": "A plastic bottle with pink liquid.",
      "appearance_details": "Clear plastic bottle with a white cap, filled with a translucent pink liquid.",
      "relationship": "The object being manipulated.",
      "location": "Initially inside the cardboard box, then lifted into the air.",
      "relative_size": "Small.",
      "orientation": "Upright, then tilted as it is lifted.",
      "pose": "Upright in the box, held by the gripper.",
      "action": "Being grasped and lifted.",
      "state_changes": "Changes from being at rest in the box to being held in the air."
    }
  ],
  "background_setting": "A clean, white tabletop in what appears to be a laboratory or testing environment. A cardboard box and several bottles are the primary objects on the table.",
  "lighting": {
    "conditions": "Bright, artificial indoor lighting.",
    "direction": "Overhead and diffuse.",
    "shadows": "Soft shadows are cast by the robot arms and the box onto the white table.",
    "illumination_effect": "Evenly illuminates the scene, highlighting the objects and the robot's movements without harsh glare."
  },
  "aesthetics": {
    "composition": "High-angle shot with the box of bottles in the lower center and the active robot arm entering from the top right.",
    "color_scheme": "Neutral palette of white, brown, and black, with the pink liquid providing a subtle color accent.",
    "mood_atmosphere": "Clinical, precise, and functional.",
    "patterns": "Repetitive pattern of the bottles in the box and the line of water bottles."
  },
  "cinematography": {
    "framing": "Medium shot, capturing the workspace and the robot's actions.",
    "camera_angle": "High-angle, looking down at the table.",
    "depth_of_field": "Deep, with all elements from the foreground box to the background bottles in focus.",
    "focus": "Sharp focus on the robot arm and the bottles.",
    "lens_focal_length": "Unknown, likely a standard or slightly wide-angle lens.",
    "camera_motion": "Static."
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "The video demonstrates a pick-and-place task performed by a dual-arm robotic system. The goal is to pick a specific bottle from a container.",
  "actions": [
    {
      "time": "00:00 - 00:05",
      "description": "The right robotic arm moves its gripper towards the cardboard box.",
      "actor": "Right robotic arm",
      "object": "Cardboard box",
      "contact_event": false,
      "state_change": "Robot arm changes position.",
      "evidence": "The arm is seen moving from the top right towards the center."
    },
    {
      "time": "00:06 - 00:09",
      "description": "The gripper closes around a bottle in the top right of the box.",
      "actor": "Right robotic arm",
      "object": "Pink liquid bottle",
      "contact_event": true,
      "state_change": "Gripper makes contact and grasps the bottle.",
      "evidence": "The gripper fingers are seen closing around the bottle's body."
    },
    {
      "time": "00:10 - 00:12",
      "description": "The robotic arm lifts the bottle out of the box.",
      "actor": "Right robotic arm",
      "object": "Pink liquid bottle",
      "contact_event": true,
      "state_change": "Bottle is lifted from its resting position.",
      "evidence": "The bottle is clearly seen being raised into the air, detached from the other bottles."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:05",
      "description": "Initial state with the robot arm approaching the target.",
      "key_changes": "Robot arm begins its approach.",
      "camera": "Static.",
      "robot_motion": "Right arm moves towards the box.",
      "object_motion": "None.",
      "physical_logic": "The robot is positioning itself for the grasp."
    },
    {
      "segment_index": 2,
      "time_range": "00:06 - 00:09",
      "description": "The robot arm grasps the bottle.",
      "key_changes": "Gripper closes on the bottle.",
      "camera": "Static.",
      "robot_motion": "Gripper closes.",
      "object_motion": "Bottle is held.",
      "physical_logic": "A successful grasp is established."
    },
    {
      "segment_index": 3,
      "time_range": "00:10 - 00:12",
      "description": "The robot arm lifts the bottle.",
      "key_changes": "Bottle is lifted out of the box.",
      "camera": "Static.",
      "robot_motion": "Arm lifts upwards.",
      "object_motion": "Bottle moves upwards.",
      "physical_logic": "The robot overcomes gravity to lift the object."
    }
  ],
  "transitions": [
    "Approach to grasp",
    "Grasp to lift"
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16:9",
  "duration": "12.167s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Dual-arm robotic system with one active arm and one passive gripper.",
    "end_effector": "Multi-fingered black gripper.",
    "manipulated_objects": "Plastic bottle with pink liquid.",
    "task_goal": "Pick a bottle from a box.",
    "initial_state": "Bottle is in the box, robot arm is idle.",
    "terminal_state": "Bottle is held in the air by the robot arm.",
    "affordances": "The bottle's shape allows for a stable grasp by the gripper.",
    "failure_modes": "Potential for dropping the bottle or failing to grasp it securely."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper makes firm contact with the bottle's body, securing it for lifting.",
    "support_and_stability": "The bottle is initially supported by the box and other bottles. Once lifted, it is supported by the gripper.",
    "force_gravity_friction": "The robot applies an upward force greater than the bottle's weight to lift it. Friction between the gripper and the bottle prevents slipping.",
    "causal_chain": "The robot's movement causes the gripper to close, which causes the bottle to be lifted.",
    "object_permanence": "The bottle remains a consistent object throughout the manipulation.",
    "next_action_prediction": "The robot will likely move the bottle to a new location or place it elsewhere."
  },
  "uncertainties": [
    "The specific brand or contents of the pink liquid are unknown.",
    "The purpose of the stationary gripper on the left is unclear."
  ],
  "comprehensive_t2v_caption": "A high-angle, static shot captures a robotic manipulation task on a white tabletop. A brown cardboard box filled with plastic bottles containing a pinkish liquid sits in the foreground. A white and black robotic arm enters from the top right, its black multi-fingered gripper moving towards the box. The arm precisely positions itself over a bottle in the top right corner of the box. The gripper then closes, securely grasping the bottle. With a smooth upward motion, the robotic arm lifts the bottle out of the box, holding it in the air. A second, stationary black gripper is visible on the left side of the frame. The scene is brightly lit, emphasizing the clean, functional environment of the robotic workspace."
}

Cosmos 3 Nano (16B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Nano (16B) · 15.8s · 2026-06-10T18:37:09+0000

Short Read

The video begins with both robot arms positioned above the cardboard box filled with water bottles. The right robot arm moves its gripper towards a bottle on the right side of the box. It then grasps the bottle and lifts it out of the box. The left robot arm remains stationary throughout the sequence. The right robot arm holds the lifted bottle steady for a moment before the video ends.

Training Caption Field

A top-down view of a white table with a cardboard box filled with plastic water bottles. Two robot arms with black grippers are positioned on either side of the box. The right robot arm moves its gripper towards a water bottle on the right side of the box, grasps it, and lifts it out of the box. The left robot arm remains stationary. The lighting is bright and even, highlighting the transparency of the plastic bottles and the texture of the cardboard box. The video ends with the right robot arm holding the lifted water bottle steady.

Full output JSON/text
{
  "scene_imagination": "A top-down view of a white table with a cardboard box filled with plastic water bottles. Two robot arms with black grippers are positioned on either side of the box. The box contains multiple rows of clear plastic bottles with white caps, some with red labels. The robot arms appear to be preparing to manipulate the bottles. The lighting is bright and even, typical of a laboratory or industrial setting.",
  "temporal_caption": "The video begins with both robot arms positioned above the cardboard box filled with water bottles. The right robot arm moves its gripper towards a bottle on the right side of the box. It then grasps the bottle and lifts it out of the box. The left robot arm remains stationary throughout the sequence. The right robot arm holds the lifted bottle steady for a moment before the video ends.",
  "audio_description": "The sound of the robot gripper closing around the bottle and lifting it from the box. There is also a faint hum from the robot arms and possibly the room's ventilation system.",
  "subjects": [
    {
      "description": "A second robot arm with a black gripper mechanism.",
      "appearance_details": "Similar to the first robot arm, it is white with black joints and a black gripper.",
      "relationship": "Positioned on the left side of the frame, stationary and not interacting with the objects.",
      "location": "Left side of the frame.",
      "relative_size": "Large within the frame.",
      "orientation": "Facing towards the center of the table.",
      "pose": "Extended towards the cardboard box.",
      "action": "Remains stationary throughout the video.",
      "state_changes": "No significant change."
    },
    {
      "description": "A cardboard box filled with plastic water bottles.",
      "appearance_details": "The box is brown and open, revealing multiple rows of clear plastic bottles with white caps. Some bottles have red labels.",
      "relationship": "Placed in the center of the table, between the two robot arms.",
      "location": "Center of the frame.",
      "relative_size": "Medium within the frame.",
      "orientation": "Facing upwards.",
      "pose": "Resting flat on the table.",
      "action": "Remains stationary while one bottle is removed by the right robot arm.",
      "state_changes": "One bottle is removed from the box."
    },
    {
      "description": "Multiple plastic water bottles.",
      "appearance_details": "Clear plastic bottles with white caps. Some have red labels with text.",
      "relationship": "Arranged in rows inside the cardboard box.",
      "location": "Inside the cardboard box.",
      "relative_size": "Small within the frame.",
      "orientation": "Upright.",
      "pose": "Standing upright in the box.",
      "action": "One bottle is grasped and lifted out of the box by the right robot arm.",
      "state_changes": "One bottle is removed from the box."
    }
  ],
  "background_setting": "A white table surface in what appears to be a laboratory or industrial setting. The table is clean and uncluttered, with only the cardboard box and robot arms visible.",
  "lighting": {
    "conditions": "Bright, even lighting.",
    "direction": "Top-down lighting.",
    "shadows": "Soft shadows cast by the robot arms and the cardboard box.",
    "illumination_effect": "The lighting highlights the transparency of the plastic bottles and the texture of the cardboard box."
  },
  "aesthetics": {
    "composition": "Centered composition with the cardboard box in the middle and the robot arms on either side.",
    "color_scheme": "Neutral color scheme with white, black, brown, and clear plastic.",
    "mood_atmosphere": "Clinical and precise.",
    "patterns": "The rows of water bottles create a repetitive pattern inside the box."
  },
  "cinematography": {
    "camera_angle": "Top-down angle.",
    "framing": "Wide shot.",
    "depth_of_field": "Deep depth of field.",
    "focus": "Sharp focus on the robot arms, cardboard box, and water bottles.",
    "lens_focal_length": "Standard lens.",
    "camera_motion": "Static camera."
  },
  "style_medium": "Robotics video.",
  "artistic_style": "Real-world robot manipulation footage.",
  "context": "A demonstration of a robot arm picking up an object from a box.",
  "actions": [
    {
      "time": "0:00-0:03",
      "description": "The right robot arm moves its gripper towards a water bottle on the right side of the cardboard box.",
      "actor": "Right robot arm.",
      "object": "Water bottle.",
      "contact_event": "The gripper makes contact with the water bottle.",
      "state_change": "The gripper closes around the water bottle."
    },
    {
      "time": "0:03-0:06",
      "description": "The right robot arm lifts the water bottle out of the cardboard box.",
      "actor": "Right robot arm.",
      "object": "Water bottle.",
      "contact_event": "The water bottle is lifted upwards.",
      "state_change": "The water bottle is removed from the box."
    },
    {
      "time": "0:06-0:12",
      "description": "The right robot arm holds the lifted water bottle steady for a moment.",
      "actor": "Right robot arm.",
      "object": "Water bottle.",
      "contact_event": "The water bottle is held in the air.",
      "state_change": "The water bottle remains in the air."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Red labels on some water bottles.",
      "category": "physical_in_scene",
      "appearance": "Red labels with white text.",
      "spatial": "On some water bottles inside the cardboard box.",
      "spatial_temporal": "Visible throughout the video.",
      "context": "Labels on the water bottles."
    }
  ],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0:00-0:03",
      "description": "The right robot arm moves towards a water bottle in the cardboard box.",
      "key_changes": "The robot arm moves closer to the bottle.",
      "camera": "Static top-down shot.",
      "robot_motion": "The right robot arm moves its gripper towards the bottle.",
      "object_motion": "The water bottle remains stationary.",
      "physical_reasoning": "The robot arm is preparing to grasp the bottle."
    },
    {
      "segment_index": 1,
      "time_range": "0:03-0:06",
      "description": "The right robot arm grasps and lifts the water bottle out of the box.",
      "key_changes": "The water bottle is removed from the box.",
      "camera": "Static top-down shot.",
      "robot_motion": "The robot arm closes its gripper around the bottle and lifts it upwards.",
      "object_motion": "The water bottle is lifted out of the box.",
      "physical_reasoning": "The robot arm successfully picks up the bottle."
    },
    {
      "segment_index": 2,
      "time_range": "0:06-0:12",
      "description": "The right robot arm holds the lifted water bottle steady.",
      "key_changes": "The water bottle remains in the air.",
      "camera": "Static top-down shot.",
      "robot_motion": "The robot arm holds the bottle steady.",
      "object_motion": "The water bottle remains in the air.",
      "physical_reasoning": "The robot arm is holding the bottle."
    }
  ],
  "transitions": [
    "No notable transitions."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "12.167s",
  "fps": 30,
  "robotics": {
    "embodiment": "Two robot arms with black grippers.",
    "end_effector": "Black grippers.",
    "manipulated_objects": "Plastic water bottles.",
    "task_goal": "Pick up a water bottle from a cardboard box.",
    "initial_state": "Both robot arms are positioned above the cardboard box. The right robot arm moves towards a water bottle.",
    "terminal_state": "The right robot arm has lifted a water bottle out of the box and is holding it steady.",
    "affordances": "The water bottles are upright and accessible.",
    "failure_modes": "The robot arm could fail to grasp the bottle or could drop it."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robot arm's gripper makes contact with the water bottle and closes around it to grasp it.",
    "support_and_stability": "The water bottle is supported by the robot arm's gripper once it is lifted out of the box.",
    "force_and_motion": "The robot arm applies a force to lift the water bottle out of the box.",
    "causal_chain": "The robot arm moves towards the bottle, grasps it, and lifts it out of the box.",
    "object_permanence": "The water bottle is removed from the box and remains in the robot arm's gripper.",
    "next_action_prediction": "The robot arm will likely move the lifted water bottle to a different location."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A top-down view of a white table with a cardboard box filled with plastic water bottles. Two robot arms with black grippers are positioned on either side of the box. The right robot arm moves its gripper towards a water bottle on the right side of the box, grasps it, and lifts it out of the box. The left robot arm remains stationary. The lighting is bright and even, highlighting the transparency of the plastic bottles and the texture of the cardboard box. The video ends with the right robot arm holding the lifted water bottle steady."
}

Cosmos 3 Super (64B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Super (64B) · 16.2s · 2026-06-10T16:13:56+0000

Short Read

The right robot arm moves its gripper towards the cardboard box, grasps a bottle, and lifts it out of the box.

Training Caption Field

A white and black robot arm moves towards a cardboard box filled with clear plastic bottles, grasps one, and lifts it out.

Full output JSON/text
{
  "scene_imagination": "Two robot arms are positioned on either side of a white table. On the left, a black robot arm with a gripper is stationary. On the right, a white and black robot arm with a gripper is actively moving. In the center of the table, there is a brown cardboard box filled with clear plastic bottles with white caps. To the left of the box, there are four more bottles arranged in a row. The background is a plain white wall with a black baseboard.",
  "temporal_caption": "The right robot arm moves its gripper towards the cardboard box, grasps a bottle, and lifts it out of the box.",
  "audio_description": "The video does not contain any audio.",
  "subjects": [
    {
      "description": "A black robot arm with a gripper, positioned on the left side of the table.",
      "appearance_details": "The arm is black with a silver base and a black gripper.",
      "relationship": "Stationary on the left side of the table.",
      "location": "Left side of the table.",
      "relative_size": "Large within the frame.",
      "orientation": "Facing the center of the table.",
      "pose": "Stationary.",
      "action": "Stationary.",
      "state_changes": "No significant change."
    },
    {
      "description": "A white and black robot arm with a gripper, positioned on the right side of the table.",
      "appearance_details": "The arm is white with black joints and a black gripper. It has a blue light on its base.",
      "relationship": "Moving towards the cardboard box.",
      "location": "Right side of the table.",
      "relative_size": "Large within the frame.",
      "orientation": "Facing the center of the table.",
      "pose": "Moving.",
      "action": "Moves its gripper towards the cardboard box, grasps a bottle, and lifts it out of the box.",
      "state_changes": "Starts stationary, moves towards the box, grasps a bottle, and lifts it."
    },
    {
      "description": "A brown cardboard box filled with clear plastic bottles with white caps.",
      "appearance_details": "The box is open and contains multiple bottles.",
      "relationship": "Located in the center of the table.",
      "location": "Center of the table.",
      "relative_size": "Medium within the frame.",
      "orientation": "Facing upwards.",
      "pose": "Stationary.",
      "action": "Stationary.",
      "state_changes": "No significant change."
    },
    {
      "description": "Four clear plastic bottles with white caps, arranged in a row on the left side of the table.",
      "appearance_details": "The bottles are clear with white caps and labels.",
      "relationship": "Located to the left of the cardboard box.",
      "location": "Left side of the table.",
      "relative_size": "Small within the frame.",
      "orientation": "Facing upwards.",
      "pose": "Stationary.",
      "action": "Stationary.",
      "state_changes": "No significant change."
    }
  ],
  "background_setting": "A white table with a plain white wall and a black baseboard in the background.",
  "lighting": {
    "conditions": "Bright indoor lighting.",
    "direction": "Top-lit.",
    "shadows": "Soft shadows on the table.",
    "illumination_effect": "Even illumination across the scene."
  },
  "aesthetics": {
    "composition": "Centered composition with the cardboard box in the middle.",
    "color_scheme": "Monochromatic with white, black, and brown tones.",
    "mood_atmosphere": "Neutral and industrial.",
    "patterns": "None."
  },
  "cinematography": {
    "camera_motion": "Static.",
    "framing": "Wide shot.",
    "camera_angle": "High angle.",
    "depth_of_field": "Deep.",
    "focus": "Sharp focus on the robot arms and the box.",
    "lens_focal_length": "Standard."
  },
  "style_medium": "Robotics video.",
  "artistic_style": "Realistic.",
  "context": "A demonstration of a robot arm picking up a bottle from a box.",
  "actions": [
    {
      "time": "0-5s",
      "description": "The right robot arm moves its gripper towards the cardboard box.",
      "actor": "Right robot arm.",
      "object": "Cardboard box.",
      "contact_event": "The gripper approaches the box."
    },
    {
      "time": "5-10s",
      "description": "The right robot arm grasps a bottle from the box.",
      "actor": "Right robot arm.",
      "object": "Bottle.",
      "contact_event": "The gripper grasps the bottle."
    },
    {
      "time": "10-15s",
      "description": "The right robot arm lifts the bottle out of the box.",
      "actor": "Right robot arm.",
      "object": "Bottle.",
      "contact_event": "The gripper lifts the bottle."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0-5s",
      "description": "The right robot arm moves its gripper towards the cardboard box.",
      "key_changes": "The right robot arm moves towards the box.",
      "camera": "Static wide shot.",
      "robot_motion": "The right robot arm moves towards the box.",
      "object_motion": "None.",
      "physical_logic": "The robot arm moves towards the box."
    },
    {
      "segment_index": 1,
      "time_range": "5-10s",
      "description": "The right robot arm grasps a bottle from the box.",
      "key_changes": "The right robot arm grasps a bottle.",
      "camera": "Static wide shot.",
      "robot_motion": "The right robot arm grasps a bottle.",
      "object_motion": "None.",
      "physical_logic": "The robot arm grasps a bottle."
    },
    {
      "segment_index": 2,
      "time_range": "10-15s",
      "description": "The right robot arm lifts the bottle out of the box.",
      "key_changes": "The right robot arm lifts the bottle.",
      "camera": "Static wide shot.",
      "robot_motion": "The right robot arm lifts the bottle.",
      "object_motion": "The bottle is lifted out of the box.",
      "physical_logic": "The robot arm lifts the bottle."
    }
  ],
  "transitions": [],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16,9",
  "duration": "15.00s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Two robot arms, one black and one white and black.",
    "end_effector": "Grippers.",
    "manipulated_objects": "Plastic bottles.",
    "task_goal": "Pick up a bottle from a box.",
    "initial_state": "The robot arms are stationary, with the right arm moving towards the box.",
    "terminal_state": "The right robot arm has lifted a bottle out of the box.",
    "affordances": "The robot arms can grasp and lift objects.",
    "failure_modes": "None observed."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The right robot arm grasps a bottle from the box.",
    "support_and_stability": "The robot arms are supported by their bases.",
    "force_gravity_friction": "The robot arms apply force to lift the bottle.",
    "causal_chain": "The robot arm moves towards the box, grasps a bottle, and lifts it.",
    "object_permanence": "The bottle is lifted out of the box.",
    "next_action_prediction": "The robot arm will place the bottle on the table."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A white and black robot arm moves towards a cardboard box filled with clear plastic bottles, grasps one, and lifts it out."
}

Cosmos 3 Super (64B) / p09: Reproduction schema

ok
p09: Reproduction schema · Cosmos 3 Super (64B) · 14.8s · 2026-06-10T18:55:51+0000

Short Read

Two robot arms are positioned over a cardboard box filled with water bottles on a white table, with one arm reaching into the box and grasping a bottle.

Training Caption Field

A top-down view of two robot arms positioned on either side of a cardboard box filled with water bottles on a white table. The robot arm on the right reaches into the box and grasps a water bottle.

Full output JSON/text
{
  "schema_version": "p09_reproduction_v1",
  "one_sentence_summary": "Two robot arms are positioned over a cardboard box filled with water bottles on a white table, with one arm reaching into the box and grasping a bottle.",
  "video_metadata": {
    "resolution": {
      "W": 1152,
      "H": 816
    },
    "aspect_ratio": "16,9",
    "duration_sec": 12.167,
    "fps": 30.0,
    "frame_count": 365
  },
  "camera": {
    "viewpoint": "top-down",
    "framing": "medium shot",
    "camera_angle": "bird's-eye",
    "camera_motion": "static",
    "estimated_height_or_distance": "1.5 meters",
    "lens_or_distortion": "none",
    "crop_boundaries": "full frame",
    "stability": "stable"
  },
  "scene_layout": {
    "coordinate_frame": "2D image plane",
    "background": "white table surface",
    "surface": "white table",
    "lighting": "bright indoor lighting",
    "global_spatial_map": "two robot arms on either side of a cardboard box filled with water bottles",
    "occlusions": "none"
  },
  "subjects": [
    {
      "id": "robot_arm_left",
      "category": "robot arm",
      "appearance": "black and silver metallic",
      "visible_parts": "arm, gripper",
      "initial_pose": "resting on left side of box",
      "final_pose": "resting on left side of box",
      "workspace_location": "left side of box",
      "motion_path": "none",
      "end_effector": "gripper",
      "uncertainties": "none"
    },
    {
      "id": "robot_arm_right",
      "category": "robot arm",
      "appearance": "black and silver metallic",
      "visible_parts": "arm, gripper",
      "initial_pose": "resting on right side of box",
      "final_pose": "grasping a bottle",
      "workspace_location": "right side of box",
      "motion_path": "reaches into box, grasps bottle",
      "end_effector": "gripper",
      "uncertainties": "none"
    }
  ],
  "objects": [
    {
      "id": "cardboard_box",
      "category": "box",
      "appearance": "brown cardboard",
      "label_or_text": "none",
      "material": "cardboard",
      "shape": "rectangular",
      "size_relative_to_frame": "medium",
      "initial_pose": "center of frame",
      "final_pose": "center of frame",
      "state_changes": "none",
      "uncertainties": "none"
    },
    {
      "id": "water_bottles",
      "category": "bottles",
      "appearance": "clear plastic with white caps",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "cylindrical",
      "size_relative_to_frame": "small",
      "initial_pose": "inside box and on table",
      "final_pose": "inside box and on table",
      "state_changes": "one bottle is grasped by robot arm right",
      "uncertainties": "none"
    }
  ],
  "initial_state": "Two robot arms are positioned on either side of a cardboard box filled with water bottles on a white table.",
  "final_state": "Robot arm right has grasped a water bottle from the box.",
  "timeline": [
    {
      "time_range": "0-4s",
      "description": "Robot arms are stationary.",
      "active_subjects": [],
      "object_states": [],
      "robot_motion": "none",
      "object_motion": "none",
      "gripper_state": "open",
      "contact_state": "none",
      "spatial_changes": "none",
      "physical_causality": "none",
      "evidence": "none"
    },
    {
      "time_range": "4-8s",
      "description": "Robot arm right reaches into the box.",
      "active_subjects": [
        "robot_arm_right"
      ],
      "object_states": [],
      "robot_motion": "reaching",
      "object_motion": "none",
      "gripper_state": "open",
      "contact_state": "none",
      "spatial_changes": "robot arm right moves closer to box",
      "physical_causality": "none",
      "evidence": "none"
    },
    {
      "time_range": "8-12s",
      "description": "Robot arm right grasps a water bottle.",
      "active_subjects": [
        "robot_arm_right"
      ],
      "object_states": [],
      "robot_motion": "grasping",
      "object_motion": "none",
      "gripper_state": "closed",
      "contact_state": "contact with bottle",
      "spatial_changes": "bottle is lifted slightly",
      "physical_causality": "none",
      "evidence": "none"
    }
  ],
  "parallel_action_graph": {
    "simultaneous_actions": [],
    "dependencies": [],
    "ordering_constraints": []
  },
  "keyframes": [
    {
      "time": 0,
      "frame_description": "Robot arms are stationary on either side of the box.",
      "subject_poses": [
        {
          "id": "robot_arm_left",
          "pose": "resting on left side of box"
        },
        {
          "id": "robot_arm_right",
          "pose": "resting on right side of box"
        }
      ],
      "object_poses": [
        {
          "id": "cardboard_box",
          "pose": "center of frame"
        },
        {
          "id": "water_bottles",
          "pose": "inside box and on table"
        }
      ],
      "contacts": [],
      "important_visual_details": []
    },
    {
      "time": 4,
      "frame_description": "Robot arm right begins to reach into the box.",
      "subject_poses": [
        {
          "id": "robot_arm_left",
          "pose": "resting on left side of box"
        },
        {
          "id": "robot_arm_right",
          "pose": "reaching into box"
        }
      ],
      "object_poses": [
        {
          "id": "cardboard_box",
          "pose": "center of frame"
        },
        {
          "id": "water_bottles",
          "pose": "inside box and on table"
        }
      ],
      "contacts": [],
      "important_visual_details": []
    },
    {
      "time": 8,
      "frame_description": "Robot arm right grasps a water bottle.",
      "subject_poses": [
        {
          "id": "robot_arm_left",
          "pose": "resting on left side of box"
        },
        {
          "id": "robot_arm_right",
          "pose": "grasping bottle"
        }
      ],
      "object_poses": [
        {
          "id": "cardboard_box",
          "pose": "center of frame"
        },
        {
          "id": "water_bottles",
          "pose": "inside box and on table"
        }
      ],
      "contacts": [
        {
          "id": "robot_arm_right",
          "contact": "contact with bottle"
        }
      ],
      "important_visual_details": []
    }
  ],
  "contact_events": [
    {
      "time": 8,
      "subject_id": "robot_arm_right",
      "object_id": "water_bottles",
      "contact_type": "grasp",
      "contact_location": "bottle cap",
      "grasp_or_support_quality": "secure",
      "release_or_transfer": "none",
      "visible_evidence": "gripper closes around bottle"
    }
  ],
  "physics_and_affordances": {
    "support_surfaces": "table",
    "gravity_and_stability": "stable",
    "force_or_constraint_reasoning": "none",
    "collision_or_near_collision": "none",
    "object_permanence": "none",
    "task_goal": "grasp water bottle",
    "failure_modes": "none",
    "next_action_prediction": "lift bottle out of box"
  },
  "appearance_fidelity": {
    "colors": "black, silver, brown, clear, white",
    "textures": "metallic, cardboard, plastic",
    "lighting_shadows": "bright indoor lighting, soft shadows",
    "robot_morphology": "two arms with grippers",
    "object_fine_details": "clear plastic bottles with white caps",
    "background_fine_details": "white table surface"
  },
  "text_and_labels": [],
  "audio_description": null,
  "generation_controls": {
    "shot_type": "medium shot",
    "camera_motion_instruction": "static",
    "motion_style": "none",
    "duration_instruction": "12.167s",
    "spatial_consistency_requirements": "none",
    "negative_constraints": "none",
    "recommended_caption_style": "none"
  },
  "uncertainties": [],
  "comprehensive_reproduction_caption": "A top-down view of two robot arms positioned on either side of a cardboard box filled with water bottles on a white table. The robot arm on the right reaches into the box and grasps a water bottle."
}

rovid_008

top

Qwen 3.5 9B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 9B · 67.6s · 2026-06-10T17:50:15+0000

Short Read

At the beginning, the robot arm hovers over the table. In the early phase, the gripper descends and grasps the red chili pepper from the cutting board. By the middle, the robot lifts the pepper off the board. Towards the end, the robot moves the pepper towards the open door of the black microwave and places it inside.

Training Caption Field

A white industrial robot arm with a black gripper stands on a table covered in a grassy texture, facing a wooden cutting board with a red chili pepper. The robot extends its arm, closes its gripper around the pepper, and lifts it off the board. It then transports the pepper towards a black microwave oven with an open door and carefully places the pepper inside the microwave cavity. The scene is brightly lit, with a beige microwave and a white fluffy rug visible in the background.

Full output JSON/text
{
  "scene_imagination": "A white industrial robot arm with a black gripper is positioned on the left side of a table covered in a grassy texture. On the table sits a wooden cutting board with a single red chili pepper. Behind the board are two microwave ovens, one black and one beige, both with doors open. A fluffy white rug or sheepskin lies in the foreground. The lighting is bright and even, suggesting an indoor simulation or studio setup.",
  "temporal_caption": "At the beginning, the robot arm hovers over the table. In the early phase, the gripper descends and grasps the red chili pepper from the cutting board. By the middle, the robot lifts the pepper off the board. Towards the end, the robot moves the pepper towards the open door of the black microwave and places it inside.",
  "audio_description": null,
  "subjects": [
    {
      "description": "Industrial robot arm",
      "appearance_details": "White segmented arm with black joints and a black two-finger gripper",
      "relationship": "Manipulator of the chili pepper",
      "location": "Left side of the frame, standing on the table surface",
      "relative_size": "Large compared to the pepper, medium compared to the microwaves",
      "orientation": "Angled towards the center-right of the table",
      "pose": "Dynamic, moving from a static hover to an active grasp and lift",
      "action": "Grasping and moving the chili pepper",
      "state_changes": "Moves from a stationary position to holding an object, then moves towards the microwave"
    },
    {
      "description": "Red chili pepper",
      "appearance_details": "Bright red, elongated vegetable with green stem",
      "relationship": "Object being manipulated",
      "location": "Initially on the wooden cutting board, later inside the microwave",
      "relative_size": "Small relative to the robot arm",
      "orientation": "Horizontal on the board, vertical when held by the gripper",
      "pose": "Static on the board, then lifted and moved",
      "action": "Being picked up and placed",
      "state_changes": "Moves from resting on the board to being held by the gripper, then placed inside the microwave"
    }
  ],
  "background_setting": "An indoor environment featuring a table with a printed grass texture. Two microwave ovens (one black, one beige) are placed against a dark brown wall or backdrop. A white fluffy rug is visible in the immediate foreground.",
  "lighting": {
    "conditions": "Bright, diffuse indoor lighting",
    "direction": "Overhead and slightly from the front",
    "shadows": "Soft shadows cast by the robot arm and objects onto the table surface",
    "illumination_effect": "Clear visibility of all objects with minimal harsh contrast"
  },
  "aesthetics": {
    "composition": "Centered on the action of the robot arm, with the microwaves providing context in the background",
    "color_scheme": "Natural tones (green grass, brown wood) contrasted with the white robot and red pepper",
    "mood_atmosphere": "Clinical, functional, and focused on task execution",
    "patterns": "Grassy texture on the table, segmented joints on the robot arm"
  },
  "cinematography": {
    "framing": "Medium shot capturing the robot arm, table, and microwaves",
    "camera_angle": "Eye-level or slightly elevated angle looking down at the table",
    "depth_of_field": "Deep focus, keeping the robot, pepper, and microwaves sharp",
    "focus": "Sharp focus on the robot arm and the chili pepper",
    "lens_focal_length": "Standard focal length, likely 35mm or 50mm equivalent",
    "camera_motion": "Static camera position throughout the clip"
  },
  "style_medium": "robotics video",
  "artistic_style": "synthetic robot simulation",
  "context": "A demonstration of robotic manipulation skills, specifically object grasping and placement into an appliance.",
  "actions": [
    {
      "time": "0.0s - 1.5s",
      "description": "The robot arm moves its gripper towards the red chili pepper on the cutting board.",
      "actor": "Robot arm",
      "object": "Red chili pepper",
      "contact_event": "Gripper approaches and makes contact with the pepper",
      "state_change": "No change yet, preparation for grasp",
      "evidence": "Gripper fingers closing around the pepper"
    },
    {
      "time": "1.5s - 3.0s",
      "description": "The robot successfully grasps the chili pepper and lifts it off the cutting board.",
      "actor": "Robot arm",
      "object": "Red chili pepper",
      "contact_event": "Firm grasp established",
      "state_change": "Pepper is lifted into the air",
      "evidence": "Pepper is no longer touching the board"
    },
    {
      "time": "3.0s - 6.0s",
      "description": "The robot moves the pepper towards the black microwave and places it inside.",
      "actor": "Robot arm",
      "object": "Red chili pepper",
      "contact_event": "Pepper is placed inside the microwave cavity",
      "state_change": "Pepper is now inside the microwave",
      "evidence": "Pepper is visible inside the microwave door opening"
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "0.0s - 1.5s",
      "description": "Robot arm approaches the chili pepper on the cutting board.",
      "key_changes": "Gripper moves from a neutral position to a grasping position",
      "camera": "Static",
      "robot_motion": "Arm extension and wrist rotation",
      "object_motion": "None",
      "physical_logic": "Approach trajectory aligns with the object's location"
    },
    {
      "segment_index": 2,
      "time_range": "1.5s - 3.0s",
      "description": "Robot grasps and lifts the chili pepper.",
      "key_changes": "Pepper is lifted off the surface",
      "camera": "Static",
      "robot_motion": "Vertical lift and slight retraction",
      "object_motion": "Lifted upwards",
      "physical_logic": "Grasp force exceeds gravity, lifting the object"
    },
    {
      "segment_index": 3,
      "time_range": "3.0s - 6.0s",
      "description": "Robot transports the pepper to the microwave and places it inside.",
      "key_changes": "Pepper is placed inside the microwave",
      "camera": "Static",
      "robot_motion": "Horizontal translation towards the microwave",
      "object_motion": "Moved from table to microwave interior",
      "physical_logic": "Object is placed within the bounds of the microwave opening"
    }
  ],
  "transitions": [
    "Approach to grasp",
    "Lift and transport",
    "Placement inside appliance"
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "6.033s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Six-axis articulated arm with a white body and black joints",
    "end_effector": "Black two-finger parallel gripper",
    "manipulated_objects": [
      "Red chili pepper"
    ],
    "task_goal": "Place the chili pepper inside the microwave",
    "initial_state": "Robot arm hovering above the cutting board, pepper resting on the board",
    "terminal_state": "Robot arm holding the pepper inside the microwave",
    "affordances": "Gripper provides a secure hold for the pepper; microwave provides a target container",
    "failure_modes": "None observed in this clip"
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper makes firm contact with the pepper, lifting it cleanly without slipping.",
    "support_and_stability": "The robot arm maintains stability while lifting and transporting the object.",
    "force_gravity_friction": "The robot overcomes gravity to lift the pepper and uses friction to hold it securely.",
    "causal_chain": "Approach -> Grasp -> Lift -> Transport -> Place",
    "object_permanence": "The pepper remains intact throughout the manipulation.",
    "next_action_prediction": "The robot may close the microwave door or initiate a cooking cycle."
  },
  "uncertainties": [
    "The specific model of the robot arm is not identifiable from the visual data."
  ],
  "comprehensive_t2v_caption": "A white industrial robot arm with a black gripper stands on a table covered in a grassy texture, facing a wooden cutting board with a red chili pepper. The robot extends its arm, closes its gripper around the pepper, and lifts it off the board. It then transports the pepper towards a black microwave oven with an open door and carefully places the pepper inside the microwave cavity. The scene is brightly lit, with a beige microwave and a white fluffy rug visible in the background."
}

Qwen 3.5 27B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 27B · 150.6s · 2026-06-10T18:19:41+0000

Short Read

At the beginning, the robotic arm is raised above the scene. In the early phase, the arm descends towards the cutting board. In the middle, the gripper closes around the red chili pepper. In the late phase, the arm lifts the pepper and moves it towards the microwave. By the end, the pepper is placed inside the microwave cavity.

Training Caption Field

A white robotic arm with a black gripper operates in a surreal environment featuring a grassy floor and a wooden cutting board. Initially, the arm descends towards a red chili pepper resting on the board. The gripper closes securely around the pepper, lifting it from the surface. The arm then transports the pepper across the frame to the right, where a white microwave with an open door is situated. Finally, the robot carefully places the chili pepper inside the microwave cavity, completing the pick-and-place sequence.

Full output JSON/text
{
  "scene_imagination": "A white industrial robotic arm with a black parallel-jaw gripper is positioned on the left side of the frame. The environment is a surreal, composite setting featuring a grassy ground with wooden planks, a white microwave with its door open on the right, and a wooden cutting board in the center. A single red chili pepper rests on the cutting board. The background includes a dark, textured wall and a patch of white fur in the foreground.",
  "temporal_caption": "At the beginning, the robotic arm is raised above the scene. In the early phase, the arm descends towards the cutting board. In the middle, the gripper closes around the red chili pepper. In the late phase, the arm lifts the pepper and moves it towards the microwave. By the end, the pepper is placed inside the microwave cavity.",
  "audio_description": null,
  "subjects": [
    {
      "description": "A white robotic arm with a black gripper.",
      "appearance_details": "White segmented body, black joints, black parallel-jaw gripper, green indicator light on the base.",
      "relationship": "Manipulator of the chili pepper.",
      "location": "Left side of the frame, extending towards the center.",
      "relative_size": "Large, dominating the left half of the frame.",
      "orientation": "Facing right towards the microwave.",
      "pose": "Articulated arm extended.",
      "action": "Grasping and moving the chili pepper.",
      "state_changes": "Moves from idle to grasping to placing."
    },
    {
      "description": "A red chili pepper.",
      "appearance_details": "Bright red, elongated shape, green stem.",
      "relationship": "Object being manipulated.",
      "location": "Initially on the cutting board, finally inside the microwave.",
      "relative_size": "Small.",
      "orientation": "Lying flat initially, held vertically then placed inside.",
      "pose": "Static on board, then held.",
      "action": "Being picked up and moved.",
      "state_changes": "Changes location from cutting board to microwave."
    },
    {
      "description": "A white microwave oven.",
      "appearance_details": "White exterior, black interior, door open to the left.",
      "relationship": "Destination for the object.",
      "location": "Right side of the frame.",
      "relative_size": "Medium.",
      "orientation": "Facing left.",
      "pose": "Stationary.",
      "action": "Receiving the object.",
      "state_changes": "Interior becomes occupied."
    }
  ],
  "background_setting": "A surreal, mixed-reality environment combining a grassy outdoor floor with wooden planks, a kitchen appliance (microwave), and a dark, rocky-looking wall background.",
  "lighting": {
    "conditions": "Artificial, diffuse lighting.",
    "direction": "Top-down and slightly from the front.",
    "shadows": "Soft shadows cast by the robot arm and the microwave.",
    "illumination_effect": "Evenly illuminates the scene, highlighting the white robot and red pepper."
  },
  "aesthetics": {
    "composition": "Asymmetrical balance with the robot on the left and microwave on the right.",
    "color_scheme": "Contrast between the white robot/microwave, green grass, and red pepper.",
    "mood_atmosphere": "Clinical yet surreal due to the mismatched environment.",
    "patterns": "Linear patterns of the wooden planks and grass texture."
  },
  "cinematography": {
    "framing": "Medium shot capturing the robot, cutting board, and microwave.",
    "camera_angle": "High angle, looking down at the workspace.",
    "depth_of_field": "Deep depth of field, keeping all elements in focus.",
    "focus": "Sharp focus on the robot and objects.",
    "lens_focal_length": "Standard to wide angle.",
    "camera_motion": "Static camera."
  },
  "style_medium": "robotics video",
  "artistic_style": "synthetic robot simulation",
  "context": "The robot is performing a pick-and-place task, moving a food item from a preparation surface to an appliance.",
  "actions": [
    {
      "time": "00:00 - 00:02",
      "description": "The robot arm moves down towards the cutting board.",
      "actor": "Robot arm",
      "object": "Chili pepper",
      "contact_event": "Approaching",
      "state_change": "None",
      "evidence": "Arm trajectory towards the object."
    },
    {
      "time": "00:02 - 00:03",
      "description": "The gripper closes around the chili pepper.",
      "actor": "Robot gripper",
      "object": "Chili pepper",
      "contact_event": "Grasping",
      "state_change": "Object held",
      "evidence": "Fingers closing on the pepper."
    },
    {
      "time": "00:03 - 00:06",
      "description": "The robot lifts the pepper and places it inside the microwave.",
      "actor": "Robot arm",
      "object": "Chili pepper",
      "contact_event": "Placing",
      "state_change": "Object moved to microwave",
      "evidence": "Arm moving pepper into the microwave cavity."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:02",
      "description": "Robot arm approaches the cutting board.",
      "key_changes": "Arm movement.",
      "camera": "Static.",
      "robot_motion": "Descending.",
      "object_motion": "None.",
      "physical_logic": "Approach phase of manipulation."
    },
    {
      "segment_index": 2,
      "time_range": "00:02 - 00:04",
      "description": "Robot grasps the chili pepper.",
      "key_changes": "Gripper closes.",
      "camera": "Static.",
      "robot_motion": "Gripping.",
      "object_motion": "None.",
      "physical_logic": "Contact and grasp."
    },
    {
      "segment_index": 3,
      "time_range": "00:04 - 00:06",
      "description": "Robot places pepper in microwave.",
      "key_changes": "Object relocation.",
      "camera": "Static.",
      "robot_motion": "Lifting and placing.",
      "object_motion": "Moving to microwave.",
      "physical_logic": "Transport and placement."
    }
  ],
  "transitions": [
    "Approach to grasp",
    "Grasp to lift",
    "Lift to place"
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16:9",
  "duration": "6.033s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "White 6-DOF robotic arm with black gripper.",
    "end_effector": "Parallel-jaw gripper.",
    "manipulated_objects": "Red chili pepper.",
    "task_goal": "Move chili pepper from cutting board to microwave.",
    "initial_state": "Pepper on board, microwave empty.",
    "terminal_state": "Pepper inside microwave.",
    "affordances": "Grasping, lifting, placing.",
    "failure_modes": "None observed."
  },
  "physical_reasoning": {
    "contact_and_grasp": "Gripper successfully contacts and secures the pepper.",
    "support_and_stability": "Pepper is stable on the board until grasped.",
    "force_gravity_friction": "Robot overcomes gravity to lift the pepper.",
    "causal_chain": "Grasp leads to lift, which leads to placement.",
    "object_permanence": "Pepper remains visible throughout the move.",
    "next_action_prediction": "Robot may close the microwave door."
  },
  "uncertainties": [
    "The environment appears to be a synthetic or composite simulation rather than a real physical space."
  ],
  "comprehensive_t2v_caption": "A white robotic arm with a black gripper operates in a surreal environment featuring a grassy floor and a wooden cutting board. Initially, the arm descends towards a red chili pepper resting on the board. The gripper closes securely around the pepper, lifting it from the surface. The arm then transports the pepper across the frame to the right, where a white microwave with an open door is situated. Finally, the robot carefully places the chili pepper inside the microwave cavity, completing the pick-and-place sequence."
}

Cosmos 3 Nano (16B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Nano (16B) · 14.6s · 2026-06-10T18:37:24+0000

Short Read

The robotic arm moves towards the red chili pepper on the wooden cutting board, grasps it with its gripper, lifts it off the board, and then moves it towards the microwave oven.

Training Caption Field

A robotic arm with a white and gray body and a black and white gripper is shown manipulating a red chili pepper on a wooden cutting board. The scene is set outdoors on a grassy surface with a brick pattern on the left and a fluffy white object in the foreground. The robotic arm moves towards the chili pepper, grasps it with its gripper, lifts it off the cutting board, and moves it towards a microwave oven with its door open. The lighting is bright and even, highlighting the details of the robotic arm and the objects it interacts with.

Full output JSON/text
{
  "scene_imagination": "A robotic arm with a gripper is positioned over a wooden cutting board on a grassy surface. The robot is attempting to pick up a red chili pepper from the board. A microwave oven is visible to the right, and a fluffy white object, possibly a sheep or sheepskin, lies in the foreground. The setting appears to be an outdoor or semi-outdoor environment with artificial grass and a brick-like pattern on the left side.",
  "temporal_caption": "The robotic arm moves towards the red chili pepper on the wooden cutting board, grasps it with its gripper, lifts it off the board, and then moves it towards the microwave oven.",
  "audio_description": "The sound of the robotic arm moving and the gripper closing around the chili pepper.",
  "subjects": [
    {
      "description": "A robotic arm with a white and gray body, featuring a black and white gripper at the end.",
      "appearance_details": "The arm has a segmented design with visible joints and a green light indicator near the base. The gripper is black with white accents and has a claw-like structure.",
      "relationship": "Interacting with the red chili pepper on the cutting board.",
      "location": "Center-left of the frame, extending towards the center.",
      "relative_size": "Large within the frame.",
      "orientation": "Facing towards the right and slightly downwards.",
      "pose": "Extended towards the cutting board.",
      "action": "Moving towards the chili pepper, grasping it, and lifting it.",
      "state_changes": "The arm moves from a resting position to grasping and lifting the chili pepper."
    },
    {
      "description": "A red chili pepper lying on a wooden cutting board.",
      "appearance_details": "Bright red color with a green stem attached.",
      "relationship": "Being manipulated by the robotic arm.",
      "location": "Center of the frame, on the wooden cutting board.",
      "relative_size": "Small within the frame.",
      "orientation": "Lying flat on the board.",
      "pose": "Static until picked up.",
      "action": "Being grasped and lifted by the robotic arm.",
      "state_changes": "Transitions from lying on the board to being held by the robotic arm."
    },
    {
      "description": "A wooden cutting board placed on a grassy surface.",
      "appearance_details": "Light brown wood with a rectangular shape.",
      "relationship": "Supporting the chili pepper.",
      "location": "Center of the frame.",
      "relative_size": "Medium within the frame.",
      "orientation": "Flat on the surface.",
      "pose": "Static.",
      "action": "None.",
      "state_changes": "No significant change."
    },
    {
      "description": "A microwave oven with its door open.",
      "appearance_details": "Light gray exterior with a transparent door and control knobs on the right side.",
      "relationship": "Receiving the chili pepper from the robotic arm.",
      "location": "Right side of the frame.",
      "relative_size": "Large within the frame.",
      "orientation": "Facing towards the left.",
      "pose": "Static.",
      "action": "None.",
      "state_changes": "No significant change."
    }
  ],
  "background_setting": "An outdoor or semi-outdoor setting with artificial grass covering the ground. There is a brick-like pattern on the left side and a fluffy white object, possibly a sheep or sheepskin, in the foreground. The environment appears to be a controlled area for testing robotic manipulation.",
  "lighting": {
    "conditions": "Bright daylight.",
    "direction": "Top-lit.",
    "shadows": "Soft shadows cast by the robotic arm and objects on the grass.",
    "illumination_effect": "Even illumination across the scene."
  },
  "aesthetics": {
    "composition": "Centered on the robotic arm and the cutting board, with the microwave on the right and the fluffy object in the foreground.",
    "color_scheme": "Natural tones with green grass, light brown wood, and a bright red chili pepper.",
    "mood_atmosphere": "Neutral and focused on the task.",
    "patterns": "Grass texture and brick pattern on the left."
  },
  "cinematography": {
    "camera_angle": "High angle.",
    "framing": "Wide shot.",
    "depth_of_field": "Deep.",
    "focus": "Sharp focus on the robotic arm and the objects it interacts with.",
    "lens_focal_length": "Wide-angle.",
    "camera_motion": "Static."
  },
  "style_medium": "Live-action video.",
  "artistic_style": "Realistic.",
  "context": "A demonstration of a robotic arm performing a manipulation task, likely for testing or educational purposes.",
  "actions": [
    {
      "time": "0.0s-2.0s",
      "description": "The robotic arm moves towards the red chili pepper on the wooden cutting board.",
      "actor": "Robotic arm.",
      "object": "Red chili pepper.",
      "contact_event": "The gripper approaches the chili pepper.",
      "state_change": "The arm transitions from a resting position to moving towards the object."
    },
    {
      "time": "2.0s-4.0s",
      "description": "The robotic arm grasps the red chili pepper with its gripper.",
      "actor": "Robotic arm.",
      "object": "Red chili pepper.",
      "contact_event": "The gripper closes around the chili pepper.",
      "state_change": "The chili pepper is lifted off the cutting board."
    },
    {
      "time": "4.0s-6.0s",
      "description": "The robotic arm lifts the red chili pepper and moves it towards the microwave oven.",
      "actor": "Robotic arm.",
      "object": "Red chili pepper.",
      "contact_event": "The chili pepper is held by the gripper.",
      "state_change": "The chili pepper is moved from the cutting board to the microwave."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "0.0s-2.0s",
      "description": "The robotic arm moves towards the red chili pepper on the wooden cutting board.",
      "key_changes": "The arm transitions from a resting position to moving towards the object.",
      "camera": "Static high angle wide shot.",
      "robot_motion": "The robotic arm extends towards the chili pepper.",
      "object_motion": "The chili pepper remains stationary on the cutting board.",
      "physical_reasoning": "The arm moves to grasp the object."
    },
    {
      "segment_index": 2,
      "time_range": "2.0s-4.0s",
      "description": "The robotic arm grasps the red chili pepper with its gripper.",
      "key_changes": "The gripper closes around the chili pepper, lifting it off the cutting board.",
      "camera": "Static high angle wide shot.",
      "robot_motion": "The gripper closes and lifts the chili pepper.",
      "object_motion": "The chili pepper is lifted off the cutting board.",
      "physical_reasoning": "The gripper applies force to pick up the object."
    },
    {
      "segment_index": 3,
      "time_range": "4.0s-6.0s",
      "description": "The robotic arm lifts the red chili pepper and moves it towards the microwave oven.",
      "key_changes": "The chili pepper is moved from the cutting board to the microwave.",
      "camera": "Static high angle wide shot.",
      "robot_motion": "The arm moves the chili pepper towards the microwave.",
      "object_motion": "The chili pepper is transported to the microwave.",
      "physical_reasoning": "The arm moves the object to its target location."
    }
  ],
  "transitions": [],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "4:3",
  "duration": "6.033s",
  "fps": 30,
  "robotics": {
    "embodiment": "Manipulation robot",
    "end_effector": "Gripper",
    "manipulated_objects": "Red chili pepper",
    "task_goal": "Pick up the chili pepper and move it to the microwave.",
    "initial_state": "The robotic arm is positioned over the cutting board with the chili pepper.",
    "terminal_state": "The robotic arm has lifted the chili pepper and is moving it towards the microwave.",
    "affordances": "The chili pepper is grasped by the gripper.",
    "failure_modes": "The gripper may fail to grasp the chili pepper or may drop it."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper makes contact with the chili pepper and applies force to grasp it.",
    "support_and_stability": "The cutting board provides a stable surface for the chili pepper.",
    "force_gravity_friction": "Gravity pulls the chili pepper down, but the gripper applies enough force to lift it.",
    "causal_chain": "The arm moves, grasps the chili pepper, and lifts it.",
    "object_permanence": "The chili pepper remains visible and is moved from one location to another.",
    "next_action_prediction": "The robotic arm will continue moving the chili pepper towards the microwave and place it inside."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A robotic arm with a white and gray body and a black and white gripper is shown manipulating a red chili pepper on a wooden cutting board. The scene is set outdoors on a grassy surface with a brick pattern on the left and a fluffy white object in the foreground. The robotic arm moves towards the chili pepper, grasps it with its gripper, lifts it off the cutting board, and moves it towards a microwave oven with its door open. The lighting is bright and even, highlighting the details of the robotic arm and the objects it interacts with."
}

Cosmos 3 Super (64B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Super (64B) · 19.0s · 2026-06-10T16:14:32+0000

Short Read

The robot arm descends towards the wooden cutting board, grasps the red chili pepper, lifts it, and moves it towards the open microwave oven.

Training Caption Field

A white robotic arm with a black gripper is positioned on a grassy surface, interacting with a wooden cutting board that holds a single red chili pepper. To the right, a beige microwave oven sits with its door open. The robot arm moves from above the cutting board, grasps the chili pepper, lifts it, and moves it towards the open microwave. The background features a textured stone wall and a patterned rug with a sheepskin-like texture in the foreground. The lighting is even and bright, highlighting the objects and the robot's movements.

Full output JSON/text
{
  "scene_imagination": "A white robotic arm with a black gripper is positioned on a grassy surface, interacting with a wooden cutting board that holds a single red chili pepper. To the right, a beige microwave oven sits with its door open. The robot arm moves from above the cutting board, grasps the chili pepper, lifts it, and moves it toward the open microwave. The background features a textured stone wall and a patterned rug with a sheepskin-like texture in the foreground. The lighting is even and bright, highlighting the objects and the robot's movements.",
  "temporal_caption": "The robot arm descends towards the wooden cutting board, grasps the red chili pepper, lifts it, and moves it towards the open microwave oven.",
  "audio_description": "Mechanical whirring and clicking sounds from the robotic arm's joints and gripper, accompanied by a faint ambient hum.",
  "subjects": [
    {
      "description": "A white robotic arm with a black gripper, mounted on a base with a green indicator light.",
      "appearance_details": "The arm has multiple articulated joints, a cylindrical base with a green light, and a black gripper with yellow accents.",
      "relationship": "Interacting with the red chili pepper and the wooden cutting board.",
      "location": "Left side of the frame, extending towards the center.",
      "relative_size": "Large within frame",
      "orientation": "Facing right",
      "pose": "Extended towards the cutting board",
      "action": "Moving towards the chili pepper, grasping it, and lifting it.",
      "state_changes": "Starts extended, moves down to grasp the pepper, lifts it, and moves towards the microwave."
    },
    {
      "description": "A single red chili pepper resting on a wooden cutting board.",
      "appearance_details": "Bright red, elongated shape with a green stem.",
      "relationship": "Being manipulated by the robotic arm.",
      "location": "Center of the frame, on the wooden cutting board.",
      "relative_size": "Small within frame",
      "orientation": "Lying flat on the board",
      "pose": "Static until grasped",
      "action": "Being picked up by the robotic arm.",
      "state_changes": "Starts resting on the board, is grasped, lifted, and moved towards the microwave."
    },
    {
      "description": "A wooden cutting board with a natural wood grain pattern.",
      "appearance_details": "Rectangular shape, light brown color with visible wood grain.",
      "relationship": "Supporting the red chili pepper.",
      "location": "Center of the frame, on the grassy surface.",
      "relative_size": "Medium within frame",
      "orientation": "Flat on the surface",
      "pose": "Static",
      "action": "None",
      "state_changes": "No significant change."
    },
    {
      "description": "A beige microwave oven with its door open.",
      "appearance_details": "Rectangular shape, beige color, with control knobs on the right side and an open door revealing the interior.",
      "relationship": "Receiving the red chili pepper from the robotic arm.",
      "location": "Right side of the frame.",
      "relative_size": "Large within frame",
      "orientation": "Facing left",
      "pose": "Static with open door",
      "action": "None",
      "state_changes": "No significant change."
    }
  ],
  "background_setting": "An outdoor or semi-outdoor setting with a grassy surface, a textured stone wall in the background, and a patterned rug with a sheepskin-like texture in the foreground.",
  "lighting": {
    "conditions": "Bright, even lighting",
    "direction": "Top-lit",
    "shadows": "Soft shadows cast by the objects on the grassy surface",
    "illumination_effect": "Evenly illuminates the scene, highlighting the textures of the objects and the grass."
  },
  "aesthetics": {
    "composition": "Centered on the interaction between the robotic arm and the chili pepper, with the microwave on the right and the base on the left.",
    "color_scheme": "Natural tones with a bright red accent from the chili pepper.",
    "mood_atmosphere": "Functional and focused on the task.",
    "patterns": "Wood grain on the cutting board, grass texture on the surface, and stone texture on the wall."
  },
  "cinematography": {
    "camera_motion": "Static",
    "framing": "Medium shot",
    "camera_angle": "High angle",
    "depth_of_field": "Deep",
    "focus": "Sharp focus on the robotic arm, cutting board, and chili pepper",
    "lens_focal_length": "Standard"
  },
  "style_medium": "Robotics video",
  "artistic_style": "Realistic simulation",
  "context": "A demonstration of a robotic arm performing a task of picking up an object and moving it towards a microwave.",
  "actions": [
    {
      "time": "0-2s",
      "description": "The robotic arm moves downwards towards the red chili pepper on the wooden cutting board.",
      "actor": "Robotic arm",
      "object": "Red chili pepper",
      "contact_event": "The gripper makes contact with the chili pepper.",
      "state_change": "The chili pepper is grasped by the gripper."
    },
    {
      "time": "2-4s",
      "description": "The robotic arm lifts the red chili pepper off the wooden cutting board.",
      "actor": "Robotic arm",
      "object": "Red chili pepper",
      "contact_event": "The chili pepper is lifted from the board.",
      "state_change": "The chili pepper is now suspended in the air."
    },
    {
      "time": "4-6s",
      "description": "The robotic arm moves the red chili pepper towards the open microwave oven.",
      "actor": "Robotic arm",
      "object": "Red chili pepper",
      "contact_event": "The chili pepper approaches the microwave opening.",
      "state_change": "The chili pepper is moved closer to the microwave."
    }
  ],
  "text_and_signage_elements": [],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0-2s",
      "description": "The robotic arm moves towards the chili pepper.",
      "key_changes": "The arm descends to grasp the pepper.",
      "camera": "Static high angle shot",
      "robot_motion": "Moving downwards",
      "object_motion": "Static until grasped",
      "physical_logic": "The arm moves to interact with the object."
    },
    {
      "segment_index": 1,
      "time_range": "2-4s",
      "description": "The robotic arm lifts the chili pepper.",
      "key_changes": "The chili pepper is lifted off the board.",
      "camera": "Static high angle shot",
      "robot_motion": "Lifting upwards",
      "object_motion": "The chili pepper is lifted.",
      "physical_logic": "The gripper secures the pepper and lifts it."
    },
    {
      "segment_index": 2,
      "time_range": "4-6s",
      "description": "The robotic arm moves the chili pepper towards the microwave.",
      "key_changes": "The chili pepper is moved closer to the microwave.",
      "camera": "Static high angle shot",
      "robot_motion": "Moving rightwards",
      "object_motion": "The chili pepper is moved.",
      "physical_logic": "The arm moves the object towards the target location."
    }
  ],
  "transitions": [
    "Continuous motion from grasping to lifting to moving towards the microwave."
  ],
  "resolution": {
    "W": 1152,
    "H": 816
  },
  "aspect_ratio": "16,9",
  "duration": "6.033s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "White robotic arm with black gripper, mounted on a base with a green indicator light.",
    "end_effector": "Black gripper with yellow accents.",
    "manipulated_objects": [
      "Red chili pepper",
      "Wooden cutting board",
      "Microwave oven"
    ],
    "task_goal": "Pick up the red chili pepper and move it towards the microwave oven.",
    "initial_state": "The robotic arm is positioned above the wooden cutting board with the red chili pepper resting on it. The microwave oven is open.",
    "terminal_state": "The robotic arm has lifted the red chili pepper and moved it towards the open microwave oven.",
    "affordances": [
      "Grasping",
      "Lifting",
      "Moving"
    ],
    "failure_modes": [
      "The gripper may fail to grasp the chili pepper securely.",
      "The arm may collide with the microwave oven."
    ]
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robotic arm's gripper makes contact with the red chili pepper and secures it.",
    "support_and_stability": "The wooden cutting board provides a stable surface for the chili pepper before it is lifted.",
    "force_gravity_friction": "The gripper applies sufficient force to overcome gravity and lift the chili pepper.",
    "causal_chain": "The robotic arm moves towards the chili pepper, grasps it, lifts it, and moves it towards the microwave.",
    "object_permanence": "The chili pepper remains in the gripper's grasp throughout the sequence.",
    "next_action_prediction": "The robotic arm will place the red chili pepper inside the microwave oven."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A white robotic arm with a black gripper is positioned on a grassy surface, interacting with a wooden cutting board that holds a single red chili pepper. To the right, a beige microwave oven sits with its door open. The robot arm moves from above the cutting board, grasps the chili pepper, lifts it, and moves it towards the open microwave. The background features a textured stone wall and a patterned rug with a sheepskin-like texture in the foreground. The lighting is even and bright, highlighting the objects and the robot's movements."
}

Cosmos 3 Super (64B) / p09: Reproduction schema

ok
p09: Reproduction schema · Cosmos 3 Super (64B) · 19.6s · 2026-06-10T18:56:11+0000

Short Read

A white robotic arm picks up a red chili pepper from a wooden cutting board and moves it toward an open microwave oven.

Training Caption Field

A white robotic arm with a black gripper is positioned over a wooden cutting board on a surface patterned with grass and wood planks. A bright red chili pepper rests on the cutting board. The robot arm moves its gripper towards the chili, closes around it, and lifts it off the board. The arm then moves the chili towards an open, older model microwave oven located to the right of the cutting board. The microwave has a glass door and control knobs on the right side. The background consists of a dark, textured wall.

Full output JSON/text
{
  "schema_version": "p09_reproduction_v1",
  "one_sentence_summary": "A white robotic arm picks up a red chili pepper from a wooden cutting board and moves it toward an open microwave oven.",
  "video_metadata": {
    "resolution": {
      "W": 1152,
      "H": 816
    },
    "aspect_ratio": "16,9",
    "duration_sec": 6.033,
    "fps": 30.0,
    "frame_count": 181
  },
  "camera": {
    "viewpoint": "High-angle, static shot looking down at the workspace.",
    "framing": "Medium shot capturing the robot arm, cutting board, microwave, and surrounding surface.",
    "camera_angle": "High angle, slightly tilted.",
    "camera_motion": "Static.",
    "estimated_height_or_distance": "Approximately 1.5 meters above the workspace.",
    "lens_or_distortion": "Standard lens with minimal distortion.",
    "crop_boundaries": "Full frame visible.",
    "stability": "Stable, no motion blur."
  },
  "scene_layout": {
    "coordinate_frame": "2D image plane with the robot arm on the left, cutting board in the center, and microwave on the right.",
    "background": "A textured surface resembling grass and wood planks, with a dark stone-like wall in the background.",
    "surface": "A flat surface covered with a printed or painted pattern of grass and wooden planks.",
    "lighting": "Even, diffused lighting with soft shadows.",
    "global_spatial_map": "Robot arm (left) -> Cutting board with chili (center) -> Microwave (right).",
    "occlusions": "The robot arm partially occludes the cutting board and microwave as it moves."
  },
  "subjects": [
    {
      "id": "robot_arm",
      "category": "robotic_arm",
      "appearance": "White, multi-jointed robotic arm with a black gripper at the end. It has a green indicator light on its base.",
      "visible_parts": "Base, joints, gripper.",
      "initial_pose": "Arm extended towards the cutting board, gripper open.",
      "final_pose": "Arm extended towards the microwave, gripper holding the chili.",
      "workspace_location": "Left side of the frame, moving towards the center and right.",
      "motion_path": "Moves from the cutting board towards the microwave.",
      "end_effector": "Black gripper.",
      "uncertainties": "None."
    }
  ],
  "objects": [
    {
      "id": "chili_pepper",
      "category": "vegetable",
      "appearance": "Bright red, elongated chili pepper with a green stem.",
      "material": "Organic, smooth surface.",
      "shape": "Elongated, slightly curved.",
      "size_relative_to_frame": "Small.",
      "initial_pose": "Resting on the wooden cutting board.",
      "final_pose": "Held by the robot gripper, moving towards the microwave.",
      "state_changes": "Picked up by the robot gripper.",
      "uncertainties": "None."
    },
    {
      "id": "cutting_board",
      "category": "cutting_board",
      "appearance": "Rectangular wooden board with a natural wood grain pattern.",
      "material": "Wood.",
      "shape": "Rectangular.",
      "size_relative_to_frame": "Medium.",
      "initial_pose": "Resting on the grass-patterned surface.",
      "final_pose": "Remains stationary.",
      "state_changes": "No significant change.",
      "uncertainties": "None."
    },
    {
      "id": "microwave_oven",
      "category": "microwave_oven",
      "appearance": "Older model, beige and brown microwave with a glass door and control knobs on the right side. The door is open.",
      "material": "Plastic and metal.",
      "shape": "Rectangular box.",
      "size_relative_to_frame": "Large.",
      "initial_pose": "Stationary with the door open.",
      "final_pose": "Stationary with the door open.",
      "state_changes": "No significant change.",
      "uncertainties": "None."
    }
  ],
  "initial_state": "The robot arm is positioned above the cutting board, which holds a red chili pepper. The microwave door is open.",
  "final_state": "The robot arm has picked up the chili pepper and is moving it towards the open microwave.",
  "timeline": [
    {
      "time_range": "0.0-2.0s",
      "description": "The robot arm moves its gripper towards the chili pepper on the cutting board.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "chili_pepper": "Resting on the cutting board.",
        "cutting_board": "Stationary.",
        "microwave_oven": "Stationary with door open."
      },
      "robot_motion": "Arm extends and lowers towards the chili.",
      "object_motion": "None.",
      "gripper_state": "Open.",
      "contact_state": "Gripper approaches the chili.",
      "spatial_changes": "Robot arm moves closer to the chili."
    },
    {
      "time_range": "2.0-3.5s",
      "description": "The robot gripper closes around the chili pepper.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "chili_pepper": "Being grasped by the gripper.",
        "cutting_board": "Stationary.",
        "microwave_oven": "Stationary with door open."
      },
      "robot_motion": "Gripper closes.",
      "object_motion": "Chili is lifted slightly.",
      "gripper_state": "Closed.",
      "contact_state": "Gripper grasps the chili.",
      "spatial_changes": "Chili is lifted off the cutting board."
    },
    {
      "time_range": "3.5-6.0s",
      "description": "The robot arm lifts the chili pepper and moves it towards the open microwave.",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "chili_pepper": "Held by the gripper, moving towards the microwave.",
        "cutting_board": "Stationary.",
        "microwave_oven": "Stationary with door open."
      },
      "robot_motion": "Arm retracts and moves right towards the microwave.",
      "object_motion": "Chili is carried by the arm.",
      "gripper_state": "Closed.",
      "contact_state": "Gripper holds the chili.",
      "spatial_changes": "Chili moves from the cutting board to near the microwave."
    }
  ],
  "parallel_action_graph": {
    "simultaneous_actions": [],
    "dependencies": [
      {
        "type": "sequence",
        "from": "Gripper opens",
        "to": "Gripper closes"
      },
      {
        "type": "sequence",
        "from": "Gripper closes",
        "to": "Arm moves towards microwave"
      }
    ],
    "ordering_constraints": [
      "Robot arm moves towards chili.",
      "Gripper closes on chili.",
      "Robot arm moves chili towards microwave."
    ]
  },
  "keyframes": [
    {
      "time": 0.0,
      "frame_description": "Robot arm positioned above the cutting board with the chili pepper.",
      "subject_poses": {
        "robot_arm": "Extended towards the cutting board, gripper open."
      },
      "object_poses": {
        "chili_pepper": "Resting on the cutting board.",
        "cutting_board": "Stationary.",
        "microwave_oven": "Stationary with door open."
      },
      "contacts": [],
      "important_visual_details": "Robot arm is white with a black gripper. The chili is bright red."
    },
    {
      "time": 2.0,
      "frame_description": "Robot gripper closes around the chili pepper.",
      "subject_poses": {
        "robot_arm": "Gripper closed on the chili."
      },
      "object_poses": {
        "chili_pepper": "Grasped by the gripper.",
        "cutting_board": "Stationary.",
        "microwave_oven": "Stationary with door open."
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "chili_pepper",
          "contact_type": "grasp"
        }
      ],
      "important_visual_details": "The chili is lifted slightly off the cutting board."
    },
    {
      "time": 4.0,
      "frame_description": "Robot arm moves the chili towards the microwave.",
      "subject_poses": {
        "robot_arm": "Arm extended towards the microwave, holding the chili."
      },
      "object_poses": {
        "chili_pepper": "Held by the gripper, moving towards the microwave.",
        "cutting_board": "Stationary.",
        "microwave_oven": "Stationary with door open."
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "chili_pepper",
          "contact_type": "hold"
        }
      ],
      "important_visual_details": "The chili is now near the open microwave door."
    }
  ],
  "contact_events": [
    {
      "time": 2.0,
      "subject_id": "robot_arm",
      "object_id": "chili_pepper",
      "contact_type": "grasp",
      "contact_location": "Gripper closes around the chili.",
      "grasp_or_support_quality": "Secure grasp.",
      "release_or_transfer": "No release.",
      "visible_evidence": "Chili is lifted slightly off the cutting board."
    }
  ],
  "physics_and_affordances": {
    "support_surfaces": "Cutting board supports the chili initially. The robot gripper supports the chili after grasping.",
    "gravity_and_stability": "The chili is lifted against gravity by the robot arm.",
    "force_or_constraint_reasoning": "The robot arm applies force to grasp and lift the chili.",
    "collision_or_near_collision": "None observed.",
    "object_permanence": "The chili is moved from the cutting board to near the microwave.",
    "task_goal": "Pick up the chili and move it towards the microwave.",
    "failure_modes": "None observed.",
    "next_action_prediction": "The robot arm will place the chili inside the microwave."
  },
  "appearance_fidelity": {
    "colors": "White robot arm, red chili, brown microwave, green grass pattern.",
    "textures": "Smooth robot arm, natural wood grain on cutting board, plastic microwave.",
    "lighting_shadows": "Even lighting with soft shadows.",
    "robot_morphology": "Multi-jointed white robotic arm with a black gripper.",
    "object_fine_details": "Green stem on the chili, control knobs on the microwave.",
    "background_fine_details": "Grass and wood plank pattern on the surface, dark stone-like wall in the background."
  },
  "text_and_labels": [],
  "audio_description": "Mechanical sounds of the robot arm moving and the gripper closing.",
  "generation_controls": {
    "shot_type": "Static high-angle shot.",
    "camera_motion_instruction": "None.",
    "motion_style": "Smooth, controlled robotic motion.",
    "duration_instruction": "6 seconds.",
    "spatial_consistency_requirements": "Maintain consistent spatial relationships between objects.",
    "negative_constraints": "None.",
    "recommended_caption_style": "Descriptive, factual."
  },
  "uncertainties": [],
  "comprehensive_reproduction_caption": "A white robotic arm with a black gripper is positioned over a wooden cutting board on a surface patterned with grass and wood planks. A bright red chili pepper rests on the cutting board. The robot arm moves its gripper towards the chili, closes around it, and lifts it off the board. The arm then moves the chili towards an open, older model microwave oven located to the right of the cutting board. The microwave has a glass door and control knobs on the right side. The background consists of a dark, textured wall."
}

rovid_009

top

Qwen 3.5 9B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 9B · 77.0s · 2026-06-10T17:51:32+0000

Short Read

At the beginning, a robotic arm hovers over a table with several bottles. In the middle, the arm descends and grasps the yellow-capped bottle on the right. By the end, the arm lifts the bottle, causing the adjacent green-capped bottle to tip over, and places the yellow bottle into the blue bin next to the other yellow bottle.

Training Caption Field

A black robotic arm with a two-finger gripper enters a high-angle view of a white marble table, where a blue bin sits centrally containing a yellow bottle. The robot descends and grasps a second yellow bottle on the right side of the table. As it lifts the bottle, the motion causes an adjacent green-capped bottle to tip over onto its side. The robot then carefully places the lifted yellow bottle into the blue bin, positioning it next to the first yellow bottle, completing the transfer task.

Full output JSON/text
{
  "scene_imagination": "A high-angle, static shot of a white marble-patterned tabletop. A blue plastic bin sits centrally, containing a yellow bottle with a white cap. To the left is a white Elmer's glue bottle. To the right are two other bottles, one with a green cap and one with a yellow cap. A black robotic arm with a gripper enters from the top right.",
  "temporal_caption": "At the beginning, a robotic arm hovers over a table with several bottles. In the middle, the arm descends and grasps the yellow-capped bottle on the right. By the end, the arm lifts the bottle, causing the adjacent green-capped bottle to tip over, and places the yellow bottle into the blue bin next to the other yellow bottle.",
  "audio_description": null,
  "subjects": [
    {
      "description": "Robotic arm",
      "appearance_details": "Black articulated arm with a black two-finger gripper end-effector.",
      "relationship": "Manipulator of the bottles.",
      "location": "Enters from the top right corner.",
      "relative_size": "Large compared to the bottles.",
      "orientation": "Vertical, descending towards the table.",
      "pose": "Dynamic, moving from a hover to a grasp and lift.",
      "action": "Grasps and moves the yellow bottle.",
      "state_changes": "Moves from outside the frame to interacting with the objects."
    },
    {
      "description": "Yellow bottle",
      "appearance_details": "Plastic bottle with yellow liquid, orange label, and white cap.",
      "relationship": "Target object for the robot.",
      "location": "Initially on the right side of the table, then inside the blue bin.",
      "relative_size": "Small, similar to other bottles.",
      "orientation": "Upright, then tilted during grasp, then upright in the bin.",
      "pose": "Standing, being lifted, placed.",
      "action": "Being moved by the robot.",
      "state_changes": "Moved from table surface to inside the blue bin."
    },
    {
      "description": "Green-capped bottle",
      "appearance_details": "Plastic bottle with white body, orange label, and green cap.",
      "relationship": "Neighbor to the target bottle.",
      "location": "Right side of the table.",
      "relative_size": "Small.",
      "orientation": "Upright, then tilted.",
      "pose": "Standing, then falling.",
      "action": "Falls over due to collision.",
      "state_changes": "Tipped over onto its side."
    },
    {
      "description": "Blue bin",
      "appearance_details": "Rectangular blue plastic container with white grips.",
      "relationship": "Receptacle for the bottles.",
      "location": "Center of the table.",
      "relative_size": "Medium.",
      "orientation": "Static, upright.",
      "pose": "Stationary.",
      "action": "Receiving a bottle.",
      "state_changes": "Contents increase by one bottle."
    }
  ],
  "background_setting": "A clean, minimalist workspace featuring a white surface with grey marble-like veining and a plain white wall in the background.",
  "lighting": {
    "conditions": "Bright, diffuse indoor lighting.",
    "direction": "Overhead, casting soft shadows.",
    "shadows": "Soft shadows cast by the bottles and the robot arm.",
    "illumination_effect": "Even illumination with high visibility of object details."
  },
  "aesthetics": {
    "composition": "Centered composition with the bin as the focal point.",
    "color_scheme": "Primary colors (blue, yellow, red, green) against a neutral white background.",
    "mood_atmosphere": "Clinical, precise, and functional.",
    "patterns": "Marble pattern on the table surface."
  },
  "cinematography": {
    "framing": "Medium shot, top-down perspective.",
    "camera_angle": "High angle, looking down at the table.",
    "depth_of_field": "Deep focus, keeping all objects sharp.",
    "focus": "Sharp focus on the central area of the table.",
    "lens_focal_length": "Unknown, likely a standard lens for overhead shots.",
    "camera_motion": "Static camera."
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "A demonstration of a robot's ability to grasp and place objects, specifically handling a bottle in a cluttered environment.",
  "actions": [
    {
      "time": "00:00 - 00:02",
      "description": "The robotic arm enters the frame from the right and approaches the yellow bottle.",
      "actor": "Robotic arm",
      "object": "Yellow bottle",
      "contact_event": "None.",
      "state_change": "None.",
      "evidence": "Arm movement visible."
    },
    {
      "time": "00:02 - 00:05",
      "description": "The gripper closes around the yellow bottle and lifts it.",
      "actor": "Robotic arm",
      "object": "Yellow bottle",
      "contact_event": "Grasp established.",
      "state_change": "Bottle lifted off the table.",
      "evidence": "Bottle is suspended in the air."
    },
    {
      "time": "00:05 - 00:07",
      "description": "The robot moves the bottle towards the blue bin, causing the green-capped bottle to tip over.",
      "actor": "Robotic arm",
      "object": "Green-capped bottle",
      "contact_event": "Collision with green bottle.",
      "state_change": "Green bottle tips over.",
      "evidence": "Green bottle falls to the side."
    },
    {
      "time": "00:07 - 00:09",
      "description": "The robot places the yellow bottle into the blue bin.",
      "actor": "Robotic arm",
      "object": "Blue bin",
      "contact_event": "Placement.",
      "state_change": "Yellow bottle is now inside the bin.",
      "evidence": "Bottle rests in the bin."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Elmer's",
      "category": "Brand logo",
      "appearance": "Blue and red logo on a white bottle.",
      "spatial": "Left side of the table.",
      "spatial_temporal": "Static throughout the video.",
      "context": "Identifies the glue bottle."
    }
  ],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00 - 00:02",
      "description": "The robot arm approaches the target bottle.",
      "key_changes": "Arm enters frame.",
      "camera": "Static.",
      "robot_motion": "Approaching.",
      "object_motion": "None.",
      "physical_logic": "Planning trajectory."
    },
    {
      "segment_index": 2,
      "time_range": "00:02 - 00:05",
      "description": "The robot grasps the yellow bottle.",
      "key_changes": "Grasp established, bottle lifted.",
      "camera": "Static.",
      "robot_motion": "Lifting.",
      "object_motion": "Vertical translation.",
      "physical_logic": "Normal force exceeds gravity."
    },
    {
      "segment_index": 3,
      "time_range": "00:05 - 00:07",
      "description": "The robot moves the bottle over the bin, knocking over a neighbor.",
      "key_changes": "Collision occurs, green bottle falls.",
      "camera": "Static.",
      "robot_motion": "Horizontal translation.",
      "object_motion": "Green bottle rotates and falls.",
      "physical_logic": "Collision transfer of momentum."
    },
    {
      "segment_index": 4,
      "time_range": "00:07 - 00:09",
      "description": "The robot places the bottle into the bin.",
      "key_changes": "Bottle settles in the bin.",
      "camera": "Static.",
      "robot_motion": "Retracting.",
      "object_motion": "Settling.",
      "physical_logic": "Gravity pulls bottle into the container."
    }
  ],
  "transitions": [
    "Entry of the robot arm",
    "Grasp and lift of the yellow bottle",
    "Collision and fall of the green bottle",
    "Placement of the yellow bottle into the bin"
  ],
  "resolution": {
    "W": 1360,
    "H": 688
  },
  "aspect_ratio": "16:9",
  "duration": "9.767s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Black articulated robotic arm.",
    "end_effector": "Black two-finger gripper.",
    "manipulated_objects": [
      "Yellow bottle",
      "Green-capped bottle"
    ],
    "task_goal": "Move the yellow bottle from the table to the blue bin.",
    "initial_state": "Bottles on the table, robot arm outside the frame.",
    "terminal_state": "Yellow bottle in the bin, green bottle on its side.",
    "affordances": "Gripper designed for cylindrical objects.",
    "failure_modes": "None observed; task completed successfully."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper successfully encloses the bottle neck.",
    "support_and_stability": "The robot supports the bottle's weight against gravity.",
    "force_gravity_friction": "Gravity pulls the green bottle down after the collision; friction keeps the yellow bottle in the bin.",
    "causal_chain": "Robot lifts yellow bottle -> moves over green bottle -> green bottle tips -> robot places yellow bottle in bin.",
    "object_permanence": "All objects remain intact throughout the interaction.",
    "next_action_prediction": "The robot may proceed to pick up the green bottle or another object."
  },
  "uncertainties": [
    "The specific contents of the bottles are unknown."
  ],
  "comprehensive_t2v_caption": "A black robotic arm with a two-finger gripper enters a high-angle view of a white marble table, where a blue bin sits centrally containing a yellow bottle. The robot descends and grasps a second yellow bottle on the right side of the table. As it lifts the bottle, the motion causes an adjacent green-capped bottle to tip over onto its side. The robot then carefully places the lifted yellow bottle into the blue bin, positioning it next to the first yellow bottle, completing the transfer task."
}

Qwen 3.5 27B / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Qwen 3.5 27B · 238.0s · 2026-06-10T18:23:39+0000

Short Read

At the beginning, the scene is static with the robotic arm positioned over the yellow bottle on the right. Early in the sequence, the gripper descends and closes around the neck of the yellow bottle. In the middle, the arm lifts the bottle, causing the adjacent white bottle with the green cap to tip over. Late in the sequence, the arm moves the yellow bottle over the blue bin. At the end, the arm releases the bottle into the bin next to the other yellow bottle.

Training Caption Field

A high-angle, static shot captures a white marble-patterned tabletop where a black robotic arm performs a sorting task. Initially, a white Elmer's glue bottle sits on the left, a blue bin with a yellow bottle is in the center, and two other bottles (one white with a green cap, one yellow with an orange label) are on the right. The robotic arm descends, its three-pronged gripper closing around the neck of the yellow bottle on the right. As the arm lifts the bottle, it inadvertently knocks over the adjacent white bottle with the green cap. The arm then transports the yellow bottle over the blue bin and releases it, placing it next to the other yellow bottle already inside. The final state shows the blue bin containing two yellow bottles, the tipped white bottle on the table, and the Elmer's glue bottle undisturbed on the left.

Full output JSON/text
{
  "scene_imagination": "A high-angle, static shot captures a white marble-patterned tabletop. On the left, a white Elmer's glue bottle with an orange cap sits stationary. In the center, a blue rectangular plastic bin contains a yellow bottle with a white cap. To the right of the bin, two more bottles are present: one white with a green cap and another yellow with an orange label and white cap. A black robotic arm with a three-pronged gripper enters from the top right. The lighting is bright and diffuse, casting soft shadows.",
  "temporal_caption": "At the beginning, the scene is static with the robotic arm positioned over the yellow bottle on the right. Early in the sequence, the gripper descends and closes around the neck of the yellow bottle. In the middle, the arm lifts the bottle, causing the adjacent white bottle with the green cap to tip over. Late in the sequence, the arm moves the yellow bottle over the blue bin. At the end, the arm releases the bottle into the bin next to the other yellow bottle.",
  "audio_description": null,
  "subjects": [
    {
      "description": "A black robotic arm with a three-pronged gripper.",
      "appearance_details": "Black, matte finish, articulated joints, three-pronged end-effector.",
      "relationship": "Manipulator of the yellow bottle.",
      "location": "Enters from top right, moves to center.",
      "relative_size": "Large relative to the bottles.",
      "orientation": "Vertical, descending.",
      "pose": "Active manipulation.",
      "action": "Grasping, lifting, moving, and releasing.",
      "state_changes": "Moves from idle to active manipulation."
    },
    {
      "description": "A yellow bottle with an orange label and white cap.",
      "appearance_details": "Yellow body, white cap, orange label with text.",
      "relationship": "Object being manipulated.",
      "location": "Initially on the right, moved to the blue bin.",
      "relative_size": "Small.",
      "orientation": "Upright, then tilted during lift, then upright in bin.",
      "pose": "Stationary, then moved.",
      "action": "Being grasped and moved.",
      "state_changes": "Position changes from table to bin."
    },
    {
      "description": "A white bottle with a green cap.",
      "appearance_details": "White body, green cap, black and orange label.",
      "relationship": "Bystander object.",
      "location": "To the right of the bin.",
      "relative_size": "Small.",
      "orientation": "Upright, then tipped over.",
      "pose": "Stationary, then fallen.",
      "action": "Tipped over by the moving bottle.",
      "state_changes": "Falls from upright to lying on its side."
    },
    {
      "description": "A blue plastic bin.",
      "appearance_details": "Blue, rectangular, with white handles.",
      "relationship": "Container for the bottles.",
      "location": "Center of the frame.",
      "relative_size": "Medium.",
      "orientation": "Upright.",
      "pose": "Stationary.",
      "action": "Receiving the yellow bottle.",
      "state_changes": "Contents increase from one to two bottles."
    },
    {
      "description": "A white Elmer's glue bottle.",
      "appearance_details": "White body, orange cap, blue and red label.",
      "relationship": "Bystander object.",
      "location": "Far left.",
      "relative_size": "Small.",
      "orientation": "Upright.",
      "pose": "Stationary.",
      "action": "None.",
      "state_changes": "None."
    },
    {
      "description": "A yellow bottle with a white cap inside the bin.",
      "appearance_details": "Yellow body, white cap, red and black label.",
      "relationship": "Object already in the bin.",
      "location": "Inside the blue bin.",
      "relative_size": "Small.",
      "orientation": "Upright.",
      "pose": "Stationary.",
      "action": "None.",
      "state_changes": "None."
    }
  ],
  "background_setting": "A clean, white marble-patterned tabletop against a plain white wall, suggesting a controlled laboratory or studio environment.",
  "lighting": {
    "conditions": "Bright, even, and diffuse.",
    "direction": "Overhead and frontal.",
    "shadows": "Soft, subtle shadows cast by the objects and the robot arm.",
    "illumination_effect": "Clear visibility of all objects and actions without harsh glare."
  },
  "aesthetics": {
    "composition": "Centered on the blue bin, with objects arranged symmetrically around it.",
    "color_scheme": "Dominant white background with pops of blue, yellow, orange, and green from the objects.",
    "mood_atmosphere": "Clinical, precise, and functional.",
    "patterns": "Marble pattern on the table surface."
  },
  "cinematography": {
    "framing": "Medium shot, capturing the entire workspace.",
    "camera_angle": "High-angle, looking down at the table.",
    "depth_of_field": "Deep, keeping all objects in focus.",
    "focus": "Sharp on the central action area.",
    "lens_focal_length": "Unknown, likely a standard lens.",
    "camera_motion": "Static."
  },
  "style_medium": "robotics video",
  "artistic_style": "real-world robot manipulation footage",
  "context": "The inferred task is for the robot to sort or organize bottles by moving a specific yellow bottle from the table into a designated blue bin.",
  "actions": [
    {
      "time": "00:00-00:02",
      "description": "The robotic arm positions its gripper over the yellow bottle on the right.",
      "actor": "Robotic arm",
      "object": "Yellow bottle",
      "contact_event": "Approaching",
      "state_change": "None",
      "evidence": "Gripper moves towards the bottle."
    },
    {
      "time": "00:02-00:04",
      "description": "The gripper closes around the neck of the yellow bottle.",
      "actor": "Robotic arm",
      "object": "Yellow bottle",
      "contact_event": "Grasping",
      "state_change": "Bottle is secured.",
      "evidence": "Gripper fingers close on the bottle."
    },
    {
      "time": "00:04-00:06",
      "description": "The arm lifts the yellow bottle, causing the adjacent white bottle to fall over.",
      "actor": "Robotic arm",
      "object": "Yellow bottle, White bottle",
      "contact_event": "Lifting, Collision",
      "state_change": "Yellow bottle is airborne; White bottle is tipped.",
      "evidence": "Yellow bottle rises; White bottle falls."
    },
    {
      "time": "00:06-00:08",
      "description": "The arm moves the yellow bottle over the blue bin.",
      "actor": "Robotic arm",
      "object": "Yellow bottle",
      "contact_event": "Moving",
      "state_change": "Position change.",
      "evidence": "Bottle is transported over the bin."
    },
    {
      "time": "00:08-00:09",
      "description": "The gripper releases the yellow bottle into the bin.",
      "actor": "Robotic arm",
      "object": "Yellow bottle",
      "contact_event": "Releasing",
      "state_change": "Bottle is now in the bin.",
      "evidence": "Gripper opens, bottle drops into bin."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Elmer's",
      "category": "Brand name",
      "appearance": "Blue and red logo on the white glue bottle.",
      "spatial": "On the leftmost bottle.",
      "spatial_temporal": "Visible throughout.",
      "context": "Product identification."
    },
    {
      "text": "Washable",
      "category": "Product feature",
      "appearance": "Text on the Elmer's glue bottle.",
      "spatial": "On the leftmost bottle.",
      "spatial_temporal": "Visible throughout.",
      "context": "Product description."
    },
    {
      "text": "Non-Toxic",
      "category": "Safety information",
      "appearance": "Text on the yellow bottle being moved.",
      "spatial": "On the yellow bottle on the right.",
      "spatial_temporal": "Visible until the bottle is moved.",
      "context": "Safety warning."
    }
  ],
  "segments": [
    {
      "segment_index": 1,
      "time_range": "00:00-00:02",
      "description": "Initial state with all objects stationary and the robot arm approaching.",
      "key_changes": "Robot arm enters the frame.",
      "camera": "Static.",
      "robot_motion": "Approaching.",
      "object_motion": "None.",
      "physical_logic": "Setup for manipulation."
    },
    {
      "segment_index": 2,
      "time_range": "00:02-00:04",
      "description": "The robot arm grasps the yellow bottle.",
      "key_changes": "Gripper closes on the bottle.",
      "camera": "Static.",
      "robot_motion": "Grasping.",
      "object_motion": "None.",
      "physical_logic": "Establishing contact."
    },
    {
      "segment_index": 3,
      "time_range": "00:04-00:06",
      "description": "The robot lifts the bottle, causing a collision.",
      "key_changes": "Bottle is lifted; adjacent bottle falls.",
      "camera": "Static.",
      "robot_motion": "Lifting.",
      "object_motion": "Yellow bottle up, white bottle down.",
      "physical_logic": "Gravity and collision."
    },
    {
      "segment_index": 4,
      "time_range": "00:06-00:08",
      "description": "The robot transports the bottle to the bin.",
      "key_changes": "Bottle moves over the bin.",
      "camera": "Static.",
      "robot_motion": "Translating.",
      "object_motion": "Yellow bottle moves.",
      "physical_logic": "Controlled movement."
    },
    {
      "segment_index": 5,
      "time_range": "00:08-00:09",
      "description": "The robot releases the bottle into the bin.",
      "key_changes": "Bottle is placed in the bin.",
      "camera": "Static.",
      "robot_motion": "Releasing.",
      "object_motion": "Bottle settles in bin.",
      "physical_logic": "Task completion."
    }
  ],
  "transitions": [
    "Approach to grasp",
    "Grasp to lift",
    "Lift to transport",
    "Transport to release"
  ],
  "resolution": {
    "W": 1360,
    "H": 688
  },
  "aspect_ratio": "19:9.6",
  "duration": "9.767s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "Black articulated robotic arm.",
    "end_effector": "Three-pronged parallel gripper.",
    "manipulated_objects": "Yellow bottle with orange label.",
    "task_goal": "Move the yellow bottle from the table into the blue bin.",
    "initial_state": "Bottle on table, bin contains one bottle.",
    "terminal_state": "Bottle in bin, bin contains two bottles.",
    "affordances": "Gripping by the neck, lifting, placing.",
    "failure_modes": "Dropping the bottle, missing the bin, knocking over other objects."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The gripper successfully makes contact and secures the bottle by its neck.",
    "support_and_stability": "The bottle is stable when held by the gripper and when placed in the bin.",
    "force_gravity_friction": "Gravity causes the tipped bottle to fall; friction keeps the other objects in place.",
    "causal_chain": "Lifting the yellow bottle causes it to collide with the white bottle, which then falls.",
    "object_permanence": "All objects remain visible and consistent throughout the sequence.",
    "next_action_prediction": "The robot may proceed to move the tipped white bottle or another object."
  },
  "uncertainties": [
    "The exact brand of the yellow and white bottles is not fully legible.",
    "The reason for the white bottle tipping is inferred from the motion of the yellow bottle."
  ],
  "comprehensive_t2v_caption": "A high-angle, static shot captures a white marble-patterned tabletop where a black robotic arm performs a sorting task. Initially, a white Elmer's glue bottle sits on the left, a blue bin with a yellow bottle is in the center, and two other bottles (one white with a green cap, one yellow with an orange label) are on the right. The robotic arm descends, its three-pronged gripper closing around the neck of the yellow bottle on the right. As the arm lifts the bottle, it inadvertently knocks over the adjacent white bottle with the green cap. The arm then transports the yellow bottle over the blue bin and releases it, placing it next to the other yellow bottle already inside. The final state shows the blue bin containing two yellow bottles, the tipped white bottle on the table, and the Elmer's glue bottle undisturbed on the left."
}

Cosmos 3 Nano (16B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Nano (16B) · 17.9s · 2026-06-10T18:37:42+0000

Short Read

At the beginning, the scene shows a blue tray with one yellow-capped bottle inside, a white bottle with an orange cap to the left, and two more bottles to the right. A black robotic arm enters from the top right, grasps the yellow-capped bottle on the right, lifts it, and places it into the blue tray. During this action, the green-capped bottle falls over. The video ends with the yellow-capped bottle now inside the tray next to the other bottle.

Training Caption Field

A top-down view of a white marble countertop with a blue plastic tray in the center. Inside the tray is a yellow-capped bottle with a white body and orange label. To the left of the tray is a white bottle with an orange cap and blue label. To the right of the tray are two more bottles: one with a green cap and white body, and another with a yellow cap and white body with an orange label. A black robotic arm enters from the top right, reaches for the yellow-capped bottle on the right, picks it up, and places it into the blue tray next to the existing bottle. During this action, the green-capped bottle falls over. The video ends with the yellow-capped bottle now inside the tray next to the other bottle.

Full output JSON/text
{
  "scene_imagination": "A top-down view of a white marble countertop with a blue plastic tray in the center. Inside the tray is a yellow-capped bottle with a white body and orange label. To the left of the tray is a white bottle with an orange cap and blue label. To the right of the tray are two more bottles: one with a green cap and white body, and another with a yellow cap and white body with an orange label. A black robotic arm enters from the top right, reaches for the yellow-capped bottle on the right, picks it up, and places it into the blue tray next to the existing bottle. The green-capped bottle falls over during the process. The lighting is bright and even, casting soft shadows.",
  "temporal_caption": "At the beginning, the scene shows a blue tray with one yellow-capped bottle inside, a white bottle with an orange cap to the left, and two more bottles to the right. A black robotic arm enters from the top right, grasps the yellow-capped bottle on the right, lifts it, and places it into the blue tray. During this action, the green-capped bottle falls over. The video ends with the yellow-capped bottle now inside the tray next to the other bottle.",
  "audio_description": "The sound of a robotic arm moving, a slight clinking noise as the bottle is picked up and placed into the tray, and a soft thud as the green-capped bottle falls over.",
  "subjects": [
    {
      "description": "A black robotic arm with a gripper end-effector.",
      "appearance_details": "The arm is black, cylindrical, and segmented, with a black gripper at the end designed to grasp objects.",
      "relationship": "The robotic arm is the agent manipulating the bottles.",
      "location": "Enters from the top right corner.",
      "relative_size": "Large within the frame.",
      "orientation": "Moves downward and to the left.",
      "pose": "Extended, reaching towards the bottles.",
      "action": "Grasps the yellow-capped bottle, lifts it, and places it into the blue tray.",
      "state_changes": "Starts extended, moves to grasp the bottle, lifts it, and then retracts slightly after placing it."
    },
    {
      "description": "A yellow-capped bottle with a white body and orange label.",
      "appearance_details": "The bottle has a yellow cap, a white body, and an orange label with text.",
      "relationship": "One of the bottles being manipulated by the robotic arm.",
      "location": "Initially on the right side of the tray, then moved into the tray.",
      "relative_size": "Small within the frame.",
      "orientation": "Upright.",
      "pose": "Standing upright.",
      "action": "Picked up by the robotic arm and placed into the blue tray.",
      "state_changes": "Moves from the right side of the tray to inside the tray."
    },
    {
      "description": "A green-capped bottle with a white body.",
      "appearance_details": "The bottle has a green cap and a white body with a black and orange label.",
      "relationship": "A bottle that falls over during the robotic arm's action.",
      "location": "On the right side of the tray.",
      "relative_size": "Small within the frame.",
      "orientation": "Initially upright, then falls over.",
      "pose": "Standing upright, then lying on its side.",
      "action": "Falls over when the robotic arm picks up the yellow-capped bottle.",
      "state_changes": "Transitions from upright to lying on its side."
    },
    {
      "description": "A white bottle with an orange cap and blue label.",
      "appearance_details": "The bottle has an orange cap and a white body with a blue label.",
      "relationship": "A stationary bottle on the left side of the tray.",
      "location": "Left side of the tray.",
      "relative_size": "Small within the frame.",
      "orientation": "Upright.",
      "pose": "Standing upright.",
      "action": "Remains stationary throughout the video.",
      "state_changes": "No significant change."
    }
  ],
  "background_setting": "A white marble countertop with subtle gray veining. The surface is clean and uncluttered, providing a neutral backdrop for the objects and the robotic arm's actions.",
  "lighting": {
    "conditions": "Bright, even lighting.",
    "direction": "Top-down lighting.",
    "shadows": "Soft shadows cast by the bottles and the robotic arm.",
    "illumination_effect": "The lighting highlights the glossy surfaces of the bottles and the marble countertop."
  },
  "aesthetics": {
    "composition": "Centered composition with the blue tray in the middle and the bottles arranged around it.",
    "color_scheme": "High contrast with white, blue, yellow, and black elements.",
    "mood_atmosphere": "Clinical and precise.",
    "patterns": "The marble countertop has a subtle veining pattern."
  },
  "cinematography": {
    "camera_angle": "Top-down angle.",
    "framing": "Wide shot capturing the entire workspace.",
    "depth_of_field": "Deep depth of field, keeping all objects in focus.",
    "focus": "Sharp focus on the objects and the robotic arm.",
    "lens_focal_length": "Standard lens.",
    "camera_motion": "Static camera."
  },
  "style_medium": "Robotics video",
  "artistic_style": "Realistic",
  "context": "A demonstration of a robotic arm performing a pick-and-place task with bottles.",
  "actions": [
    {
      "time": "0:00-0:02",
      "description": "The robotic arm enters from the top right and reaches for the yellow-capped bottle on the right.",
      "actor": "Robotic arm",
      "object": "Yellow-capped bottle",
      "contact_event": "The gripper makes contact with the yellow-capped bottle.",
      "state_change": "The yellow-capped bottle is grasped by the robotic arm."
    },
    {
      "time": "0:02-0:05",
      "description": "The robotic arm lifts the yellow-capped bottle and moves it towards the blue tray.",
      "actor": "Robotic arm",
      "object": "Yellow-capped bottle",
      "contact_event": "The yellow-capped bottle is lifted off the countertop.",
      "state_change": "The yellow-capped bottle is moved from the right side of the tray to inside the tray."
    },
    {
      "time": "0:05-0:08",
      "description": "The robotic arm places the yellow-capped bottle into the blue tray next to the existing bottle.",
      "actor": "Robotic arm",
      "object": "Yellow-capped bottle",
      "contact_event": "The yellow-capped bottle is placed into the tray.",
      "state_change": "The yellow-capped bottle is now inside the blue tray."
    },
    {
      "time": "0:08-0:09",
      "description": "The robotic arm retracts slightly after placing the bottle.",
      "actor": "Robotic arm",
      "object": "Yellow-capped bottle",
      "contact_event": "The robotic arm releases the yellow-capped bottle.",
      "state_change": "The robotic arm moves away from the tray."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Unreadable text on labels",
      "category": "physical_in_scene",
      "appearance": "Text on the labels of the bottles, mostly unreadable due to distance and angle.",
      "spatial": "On the bottles.",
      "spatial_temporal": "Static throughout the video.",
      "context": "Labels indicating the contents or brand of the bottles."
    }
  ],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0:00-0:02",
      "description": "The robotic arm enters the frame and reaches for the yellow-capped bottle.",
      "key_changes": "The robotic arm enters the scene.",
      "camera": "Static top-down shot.",
      "robot_motion": "The robotic arm extends towards the yellow-capped bottle.",
      "object_motion": "The yellow-capped bottle is grasped by the robotic arm.",
      "physical_logic": "The robotic arm moves to pick up the bottle."
    },
    {
      "segment_index": 1,
      "time_range": "0:02-0:05",
      "description": "The robotic arm lifts the yellow-capped bottle and moves it towards the blue tray.",
      "key_changes": "The yellow-capped bottle is lifted and moved.",
      "camera": "Static top-down shot.",
      "robot_motion": "The robotic arm lifts and moves the yellow-capped bottle.",
      "object_motion": "The yellow-capped bottle is moved from the right side of the tray to inside the tray.",
      "physical_logic": "The robotic arm transports the bottle to the tray."
    },
    {
      "segment_index": 2,
      "time_range": "0:05-0:09",
      "description": "The robotic arm places the yellow-capped bottle into the blue tray and retracts.",
      "key_changes": "The yellow-capped bottle is placed into the tray, and the green-capped bottle falls over.",
      "camera": "Static top-down shot.",
      "robot_motion": "The robotic arm places the bottle and retracts.",
      "object_motion": "The yellow-capped bottle is placed into the tray, and the green-capped bottle falls over.",
      "physical_logic": "The robotic arm completes the placement of the bottle."
    }
  ],
  "transitions": [
    "Continuous action with no visible cuts or transitions.",
    "Continuous action with no visible cuts or transitions."
  ],
  "resolution": {
    "W": 1360,
    "H": 688
  },
  "aspect_ratio": "16,0",
  "duration": "9.767s",
  "fps": 30,
  "robotics": {
    "embodiment": "Manipulation robot",
    "end_effector": "Gripper",
    "manipulated_objects": [
      "Yellow-capped bottle",
      "Green-capped bottle"
    ],
    "task_goal": "Place the yellow-capped bottle into the blue tray.",
    "initial_state": "The yellow-capped bottle is on the right side of the tray, and the green-capped bottle is upright next to it.",
    "terminal_state": "The yellow-capped bottle is inside the blue tray, and the green-capped bottle has fallen over.",
    "affordances": "The bottles have caps that can be grasped by the robotic arm.",
    "failure_modes": "The green-capped bottle falls over during the manipulation."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robotic arm's gripper makes contact with the yellow-capped bottle, grasps it, and lifts it.",
    "support_and_stability": "The blue tray provides a stable surface for the bottles.",
    "force_and_motion": "The robotic arm applies a lifting force to move the bottle, and the force of the movement causes the green-capped bottle to fall over.",
    "causal_chain": "The robotic arm grasps the yellow-capped bottle, lifts it, and places it into the tray, causing the green-capped bottle to fall over.",
    "object_permanence": "The objects remain in the scene, with their positions changing as a result of the robotic arm's actions."
  },
  "uncertainties": [
    "The exact text on the bottle labels is not clearly visible."
  ],
  "comprehensive_t2v_caption": "A top-down view of a white marble countertop with a blue plastic tray in the center. Inside the tray is a yellow-capped bottle with a white body and orange label. To the left of the tray is a white bottle with an orange cap and blue label. To the right of the tray are two more bottles: one with a green cap and white body, and another with a yellow cap and white body with an orange label. A black robotic arm enters from the top right, reaches for the yellow-capped bottle on the right, picks it up, and places it into the blue tray next to the existing bottle. During this action, the green-capped bottle falls over. The video ends with the yellow-capped bottle now inside the tray next to the other bottle."
}

Cosmos 3 Super (64B) / p08: Cosmos 3 schema

ok
p08: Cosmos 3 schema · Cosmos 3 Super (64B) · 22.3s · 2026-06-10T16:15:13+0000

Short Read

The video begins with a static top-down view of a white marble surface featuring a blue tray and four bottles. A black robotic arm enters from the top right, grasps the yellow bottle with an orange band, lifts it, moves it over the blue tray, and places it inside next to the existing yellow bottle. The arm then retracts slightly, leaving the yellow bottle in the tray.

Training Caption Field

A top-down view of a white marble surface with a blue plastic tray in the center. Several bottles are scattered around the tray: a white bottle with a red cap on the left, a yellow bottle with a white cap inside the tray, a white bottle with a green cap to the right of the tray, and a yellow bottle with an orange band and white cap to the far right. A black robotic arm with a gripper enters from the top right, grasps the yellow bottle with the orange band, lifts it, moves it over the blue tray, and places it inside next to the existing yellow bottle. The white bottle with the green cap remains stationary and untouched throughout the sequence.

Full output JSON/text
{
  "scene_imagination": "A top-down view of a white marble surface with a blue plastic tray in the center. Several bottles are scattered around the tray: a white bottle with a red cap on the left, a yellow bottle with a white cap inside the tray, a white bottle with a green cap to the right of the tray, and a yellow bottle with an orange band and white cap to the far right. A black robotic arm with a gripper enters from the top right, picks up the yellow bottle with the orange band, moves it over the blue tray, and places it inside next to the existing yellow bottle. The white bottle with the green cap remains stationary and untouched throughout the sequence.",
  "temporal_caption": "The video begins with a static top-down view of a white marble surface featuring a blue tray and four bottles. A black robotic arm enters from the top right, grasps the yellow bottle with an orange band, lifts it, moves it over the blue tray, and places it inside next to the existing yellow bottle. The arm then retracts slightly, leaving the yellow bottle in the tray.",
  "audio_description": "The video is silent, with no audible sounds or music.",
  "subjects": [
    {
      "description": "A black robotic arm with a gripper at the end.",
      "appearance_details": "The arm is black with a white cylindrical section near the base. The gripper has two black prongs.",
      "relationship": "Interacting with the yellow bottle with an orange band.",
      "location": "Enters from the top right, moves to the center, and then retracts.",
      "relative_size": "Large within the frame.",
      "orientation": "Facing downwards towards the surface.",
      "pose": "Extended downwards, then retracted.",
      "action": "Moves into the frame, grasps a bottle, lifts it, moves it to the tray, and releases it.",
      "state_changes": "Starts outside the frame, moves in, grasps the bottle, lifts it, moves it, releases it, and retracts."
    },
    {
      "description": "A yellow bottle with an orange band and a white cap.",
      "appearance_details": "The bottle is cylindrical with a white label and an orange band around the middle. The cap is white and pointed.",
      "relationship": "Initially on the surface, then picked up by the robotic arm and placed in the blue tray.",
      "location": "Starts on the right side of the frame, moves to the center, and ends in the blue tray.",
      "relative_size": "Medium within the frame.",
      "orientation": "Upright.",
      "pose": "Standing upright.",
      "action": "Picked up by the robotic arm, moved, and placed in the tray.",
      "state_changes": "Starts on the surface, is lifted, moved, and placed in the tray."
    },
    {
      "description": "A yellow bottle with a white cap.",
      "appearance_details": "The bottle is cylindrical with a white label and a yellow band around the middle. The cap is white and pointed.",
      "relationship": "Remains stationary inside the blue tray.",
      "location": "Inside the blue tray, center of the frame.",
      "relative_size": "Medium within the frame.",
      "orientation": "Upright.",
      "pose": "Standing upright.",
      "action": "Remains stationary.",
      "state_changes": "No significant change."
    },
    {
      "description": "A white bottle with a red cap.",
      "appearance_details": "The bottle is cylindrical with a white label and a red cap.",
      "relationship": "Remains stationary on the left side of the frame.",
      "location": "Left side of the frame.",
      "relative_size": "Medium within the frame.",
      "orientation": "Upright.",
      "pose": "Standing upright.",
      "action": "Remains stationary.",
      "state_changes": "No significant change."
    },
    {
      "description": "A white bottle with a green cap.",
      "appearance_details": "The bottle is cylindrical with a white label and a green cap.",
      "relationship": "Remains stationary on the right side of the frame.",
      "location": "Right side of the frame, next to the blue tray.",
      "relative_size": "Medium within the frame.",
      "orientation": "Upright.",
      "pose": "Standing upright.",
      "action": "Remains stationary.",
      "state_changes": "No significant change."
    }
  ],
  "background_setting": "A white marble surface with subtle grey veining. A blue plastic tray is placed in the center of the frame.",
  "lighting": {
    "conditions": "Bright, even lighting.",
    "direction": "Top-down lighting.",
    "shadows": "Soft shadows cast by the bottles and the robotic arm.",
    "illumination_effect": "Evenly illuminates the scene with minimal glare."
  },
  "aesthetics": {
    "composition": "Centered composition with the blue tray in the middle and bottles arranged around it.",
    "color_scheme": "High contrast with white, blue, yellow, and black elements.",
    "mood_atmosphere": "Clinical and precise.",
    "patterns": "Subtle grey veining on the white marble surface."
  },
  "cinematography": {
    "camera_motion": "Static camera.",
    "framing": "Top-down view, wide shot.",
    "camera_angle": "Bird's-eye view.",
    "depth_of_field": "Deep depth of field, all objects are in focus.",
    "focus": "Sharp focus on all objects.",
    "lens_focal_length": "Standard lens."
  },
  "style_medium": "3D animation.",
  "artistic_style": "Realistic 3D rendering.",
  "context": "A demonstration of a robotic arm performing a pick-and-place task.",
  "actions": [
    {
      "time": "0:00-0:02",
      "description": "The robotic arm enters the frame from the top right.",
      "actor": "Robotic arm.",
      "object": "None.",
      "contact_event": "None.",
      "state_changes": "Robotic arm enters the frame."
    },
    {
      "time": "0:02-0:04",
      "description": "The robotic arm moves towards the yellow bottle with the orange band.",
      "actor": "Robotic arm.",
      "object": "Yellow bottle with orange band.",
      "contact_event": "The robotic arm approaches the bottle.",
      "state_changes": "The robotic arm moves closer to the bottle."
    },
    {
      "time": "0:04-0:06",
      "description": "The robotic arm grasps the yellow bottle with the orange band.",
      "actor": "Robotic arm.",
      "object": "Yellow bottle with orange band.",
      "contact_event": "The robotic arm makes contact with the bottle.",
      "state_changes": "The robotic arm grasps the bottle."
    },
    {
      "time": "0:06-0:08",
      "description": "The robotic arm lifts the yellow bottle with the orange band and moves it over the blue tray.",
      "actor": "Robotic arm.",
      "object": "Yellow bottle with orange band.",
      "contact_event": "The robotic arm lifts the bottle.",
      "state_changes": "The bottle is lifted and moved."
    },
    {
      "time": "0:08-0:10",
      "description": "The robotic arm places the yellow bottle with the orange band inside the blue tray.",
      "actor": "Robotic arm.",
      "object": "Yellow bottle with orange band.",
      "contact_event": "The robotic arm releases the bottle.",
      "state_changes": "The bottle is placed in the tray."
    }
  ],
  "text_and_signage_elements": [
    {
      "text": "Unreadable text on bottle labels.",
      "category": "physical_in_scene",
      "appearance": "Small, printed text on the labels of the bottles.",
      "spatial": "On the labels of the bottles.",
      "spatial_temporal": "Static on the bottles throughout the video.",
      "context": "Product labels."
    }
  ],
  "segments": [
    {
      "segment_index": 0,
      "time_range": "0:00-0:02",
      "description": "The robotic arm enters the frame.",
      "key_changes": "Robotic arm enters.",
      "camera": "Static top-down view.",
      "robot_motion": "Robotic arm moves into the frame.",
      "object_motion": "None.",
      "physical_logic": "None."
    },
    {
      "segment_index": 1,
      "time_range": "0:02-0:06",
      "description": "The robotic arm approaches and grasps the yellow bottle with the orange band.",
      "key_changes": "Robotic arm grasps the bottle.",
      "camera": "Static top-down view.",
      "robot_motion": "Robotic arm moves towards the bottle and grasps it.",
      "object_motion": "None.",
      "physical_logic": "None."
    },
    {
      "segment_index": 2,
      "time_range": "0:06-0:10",
      "description": "The robotic arm lifts the bottle, moves it over the tray, and places it inside.",
      "key_changes": "Bottle is moved and placed in the tray.",
      "camera": "Static top-down view.",
      "robot_motion": "Robotic arm lifts, moves, and places the bottle.",
      "object_motion": "Bottle is moved.",
      "physical_logic": "None."
    }
  ],
  "transitions": [
    "Continuous action with no cuts or transitions."
  ],
  "resolution": {
    "W": 1360,
    "H": 688
  },
  "aspect_ratio": "16,9",
  "duration": "9.767s",
  "fps": 30.0,
  "robotics": {
    "embodiment": "A black robotic arm with a gripper.",
    "end_effector": "A gripper with two black prongs.",
    "manipulated_objects": [
      "Yellow bottle with orange band."
    ],
    "task_goal": "Pick up the yellow bottle with the orange band and place it in the blue tray.",
    "initial_state": "The robotic arm is outside the frame. The yellow bottle with the orange band is on the surface.",
    "terminal_state": "The robotic arm has placed the yellow bottle with the orange band in the blue tray.",
    "affordances": "The robotic arm can grasp, lift, move, and release objects.",
    "failure_modes": "None observed."
  },
  "physical_reasoning": {
    "contact_and_grasp": "The robotic arm successfully grasps the yellow bottle with the orange band.",
    "support_and_stability": "The robotic arm provides stable support for the bottle during the movement.",
    "force_gravity_friction": "The robotic arm applies sufficient force to lift the bottle against gravity.",
    "causal_chain": "The robotic arm enters the frame, grasps the bottle, lifts it, moves it, and places it in the tray.",
    "object_permanence": "The bottle is moved from the surface to the tray.",
    "next_action_prediction": "The robotic arm retracts from the frame."
  },
  "uncertainties": [],
  "comprehensive_t2v_caption": "A top-down view of a white marble surface with a blue plastic tray in the center. Several bottles are scattered around the tray: a white bottle with a red cap on the left, a yellow bottle with a white cap inside the tray, a white bottle with a green cap to the right of the tray, and a yellow bottle with an orange band and white cap to the far right. A black robotic arm with a gripper enters from the top right, grasps the yellow bottle with the orange band, lifts it, moves it over the blue tray, and places it inside next to the existing yellow bottle. The white bottle with the green cap remains stationary and untouched throughout the sequence."
}

Cosmos 3 Super (64B) / p09: Reproduction schema

ok
p09: Reproduction schema · Cosmos 3 Super (64B) · 30.2s · 2026-06-10T16:15:44+0000

Short Read

A black robot arm with a multi-fingered gripper enters from the right, picks up a yellow-capped bottle from the table, and places it into a blue plastic bin containing another similar bottle, while a white-capped bottle and a green-capped bottle remain on the table.

Training Caption Field

A top-down view shows a white marble-patterned table with a blue plastic bin centered, containing one yellow-capped bottle. A white-capped bottle is on the left, and a green-capped bottle and a yellow-capped bottle are on the right. A black robot arm enters from the top right, moves to the right, grasps the yellow-capped bottle, lifts it, moves it to the center, places it into the blue bin, and then retracts to the top right. The final state shows the blue bin containing two yellow-capped bottles, with the white-capped bottle on the left and the green-capped bottle on the right.

Full output JSON/text
{
  "schema_version": "p09_reproduction_v1",
  "one_sentence_summary": "A black robot arm with a multi-fingered gripper enters from the right, picks up a yellow-capped bottle from the table, and places it into a blue plastic bin containing another similar bottle, while a white-capped bottle and a green-capped bottle remain on the table.",
  "video_metadata": {
    "resolution": {
      "W": 1360,
      "H": 688
    },
    "aspect_ratio": "16,9",
    "duration_sec": 9.767,
    "fps": 30.0,
    "frame_count": 293
  },
  "camera": {
    "viewpoint": "top-down",
    "framing": "medium",
    "camera_angle": "bird's-eye",
    "camera_motion": "static",
    "estimated_height_or_distance": "1.5m",
    "lens_or_distortion": "none",
    "crop_boundaries": "none",
    "stability": "stable"
  },
  "scene_layout": {
    "coordinate_frame": "image",
    "background": "white marble-patterned surface",
    "surface": "white marble-patterned table",
    "lighting": "even, bright, diffused",
    "global_spatial_map": "blue bin centered, white-capped bottle on left, green-capped bottle on right, yellow-capped bottle initially on right then moved to bin",
    "occlusions": "none"
  },
  "subjects": [
    {
      "id": "robot_arm",
      "category": "robot arm",
      "appearance": "black, multi-jointed, with a black multi-fingered gripper",
      "visible_parts": "arm, gripper",
      "initial_pose": "entering from top right",
      "final_pose": "retracted to top right",
      "workspace_location": "top right",
      "motion_path": "enters, moves to right, picks up bottle, moves to center, places bottle, retracts",
      "end_effector": "black multi-fingered gripper",
      "uncertainties": null
    }
  ],
  "objects": [
    {
      "id": "blue_bin",
      "category": "bin",
      "appearance": "blue plastic, rectangular, with white handles",
      "label_or_text": "none",
      "material": "plastic",
      "shape": "rectangular",
      "size_relative_to_frame": "medium",
      "initial_pose": "center",
      "final_pose": "center",
      "state_changes": "contains one bottle initially, contains two bottles finally",
      "uncertainties": null
    },
    {
      "id": "white_capped_bottle",
      "category": "bottle",
      "appearance": "white body, orange cap, label with text",
      "label_or_text": "illegible",
      "material": "plastic",
      "shape": "cylindrical",
      "size_relative_to_frame": "small",
      "initial_pose": "left side",
      "final_pose": "left side",
      "state_changes": "none",
      "uncertainties": null
    },
    {
      "id": "yellow_capped_bottle_1",
      "category": "bottle",
      "appearance": "white body, yellow cap, label with text",
      "label_or_text": "illegible",
      "material": "plastic",
      "shape": "cylindrical",
      "size_relative_to_frame": "small",
      "initial_pose": "inside blue bin",
      "final_pose": "inside blue bin",
      "state_changes": "none",
      "uncertainties": null
    },
    {
      "id": "yellow_capped_bottle_2",
      "category": "bottle",
      "appearance": "white body, yellow cap, label with text",
      "label_or_text": "illegible",
      "material": "plastic",
      "shape": "cylindrical",
      "size_relative_to_frame": "small",
      "initial_pose": "right side",
      "final_pose": "inside blue bin",
      "state_changes": "moved from right side to inside blue bin",
      "uncertainties": null
    },
    {
      "id": "green_capped_bottle",
      "category": "bottle",
      "appearance": "white body, green cap, label with text",
      "label_or_text": "illegible",
      "material": "plastic",
      "shape": "cylindrical",
      "size_relative_to_frame": "small",
      "initial_pose": "right side",
      "final_pose": "right side",
      "state_changes": "none",
      "uncertainties": null
    }
  ],
  "initial_state": "blue bin centered with one yellow-capped bottle inside, white-capped bottle on left, green-capped bottle and yellow-capped bottle on right, robot arm entering from top right",
  "final_state": "blue bin centered with two yellow-capped bottles inside, white-capped bottle on left, green-capped bottle on right, robot arm retracted to top right",
  "timeline": [
    {
      "time_range": "0:00-0:02",
      "description": "robot arm enters from top right and moves towards the yellow-capped bottle on the right",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "blue_bin": "center, one yellow-capped bottle inside",
        "white_capped_bottle": "left side",
        "yellow_capped_bottle_1": "inside blue bin",
        "yellow_capped_bottle_2": "right side",
        "green_capped_bottle": "right side"
      },
      "robot_motion": "entering, moving right",
      "object_motion": "none",
      "gripper_state": "open",
      "contact_state": "none",
      "spatial_changes": "robot arm enters frame",
      "physical_causality": "none"
    },
    {
      "time_range": "0:02-0:04",
      "description": "robot arm grasps the yellow-capped bottle on the right",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "blue_bin": "center, one yellow-capped bottle inside",
        "white_capped_bottle": "left side",
        "yellow_capped_bottle_1": "inside blue bin",
        "yellow_capped_bottle_2": "right side, being grasped",
        "green_capped_bottle": "right side"
      },
      "robot_motion": "moving right, closing gripper",
      "object_motion": "yellow_capped_bottle_2 being grasped",
      "gripper_state": "closing",
      "contact_state": "grasping yellow_capped_bottle_2",
      "spatial_changes": "yellow_capped_bottle_2 lifted slightly",
      "physical_causality": "robot arm grasps bottle"
    },
    {
      "time_range": "0:04-0:06",
      "description": "robot arm lifts the yellow-capped bottle and moves it towards the blue bin",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "blue_bin": "center, one yellow-capped bottle inside",
        "white_capped_bottle": "left side",
        "yellow_capped_bottle_1": "inside blue bin",
        "yellow_capped_bottle_2": "being moved towards blue bin",
        "green_capped_bottle": "right side"
      },
      "robot_motion": "moving left, lifting",
      "object_motion": "yellow_capped_bottle_2 being moved",
      "gripper_state": "closed",
      "contact_state": "holding yellow_capped_bottle_2",
      "spatial_changes": "yellow_capped_bottle_2 moved from right to center",
      "physical_causality": "robot arm moves bottle"
    },
    {
      "time_range": "0:06-0:08",
      "description": "robot arm places the yellow-capped bottle into the blue bin",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "blue_bin": "center, two yellow-capped bottles inside",
        "white_capped_bottle": "left side",
        "yellow_capped_bottle_1": "inside blue bin",
        "yellow_capped_bottle_2": "inside blue bin",
        "green_capped_bottle": "right side"
      },
      "robot_motion": "moving down, opening gripper",
      "object_motion": "yellow_capped_bottle_2 placed in bin",
      "gripper_state": "opening",
      "contact_state": "releasing yellow_capped_bottle_2",
      "spatial_changes": "yellow_capped_bottle_2 placed in bin",
      "physical_causality": "robot arm places bottle"
    },
    {
      "time_range": "0:08-0:10",
      "description": "robot arm retracts to the top right",
      "active_subjects": [
        "robot_arm"
      ],
      "object_states": {
        "blue_bin": "center, two yellow-capped bottles inside",
        "white_capped_bottle": "left side",
        "yellow_capped_bottle_1": "inside blue bin",
        "yellow_capped_bottle_2": "inside blue bin",
        "green_capped_bottle": "right side"
      },
      "robot_motion": "retracting",
      "object_motion": "none",
      "gripper_state": "open",
      "contact_state": "none",
      "spatial_changes": "robot arm leaves frame",
      "physical_causality": "none"
    }
  ],
  "parallel_action_graph": {
    "simultaneous_actions": [],
    "dependencies": [
      {
        "action": "grasp yellow_capped_bottle_2",
        "dependency": "move to right"
      },
      {
        "action": "move to center",
        "dependency": "grasp yellow_capped_bottle_2"
      },
      {
        "action": "place yellow_capped_bottle_2 in bin",
        "dependency": "move to center"
      },
      {
        "action": "retract",
        "dependency": "place yellow_capped_bottle_2 in bin"
      }
    ],
    "ordering_constraints": [
      "enter",
      "move to right",
      "grasp yellow_capped_bottle_2",
      "move to center",
      "place yellow_capped_bottle_2 in bin",
      "retract"
    ]
  },
  "keyframes": [
    {
      "time": 0.0,
      "frame_description": "robot arm enters from top right, blue bin centered with one yellow-capped bottle inside, white-capped bottle on left, green-capped bottle and yellow-capped bottle on right",
      "subject_poses": {
        "robot_arm": "entering from top right"
      },
      "object_poses": {
        "blue_bin": "center, one yellow-capped bottle inside",
        "white_capped_bottle": "left side",
        "yellow_capped_bottle_1": "inside blue bin",
        "yellow_capped_bottle_2": "right side",
        "green_capped_bottle": "right side"
      },
      "contacts": [],
      "important_visual_details": "robot arm entering frame"
    },
    {
      "time": 2.0,
      "frame_description": "robot arm grasps the yellow-capped bottle on the right",
      "subject_poses": {
        "robot_arm": "moving right, closing gripper"
      },
      "object_poses": {
        "blue_bin": "center, one yellow-capped bottle inside",
        "white_capped_bottle": "left side",
        "yellow_capped_bottle_1": "inside blue bin",
        "yellow_capped_bottle_2": "right side, being grasped",
        "green_capped_bottle": "right side"
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "yellow_capped_bottle_2",
          "contact_type": "grasping"
        }
      ],
      "important_visual_details": "robot arm grasps bottle"
    },
    {
      "time": 4.0,
      "frame_description": "robot arm lifts the yellow-capped bottle and moves it towards the blue bin",
      "subject_poses": {
        "robot_arm": "moving left, lifting"
      },
      "object_poses": {
        "blue_bin": "center, one yellow-capped bottle inside",
        "white_capped_bottle": "left side",
        "yellow_capped_bottle_1": "inside blue bin",
        "yellow_capped_bottle_2": "being moved towards blue bin",
        "green_capped_bottle": "right side"
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "yellow_capped_bottle_2",
          "contact_type": "holding"
        }
      ],
      "important_visual_details": "robot arm moves bottle"
    },
    {
      "time": 6.0,
      "frame_description": "robot arm places the yellow-capped bottle into the blue bin",
      "subject_poses": {
        "robot_arm": "moving down, opening gripper"
      },
      "object_poses": {
        "blue_bin": "center, two yellow-capped bottles inside",
        "white_capped_bottle": "left side",
        "yellow_capped_bottle_1": "inside blue bin",
        "yellow_capped_bottle_2": "inside blue bin",
        "green_capped_bottle": "right side"
      },
      "contacts": [
        {
          "subject_id": "robot_arm",
          "object_id": "yellow_capped_bottle_2",
          "contact_type": "releasing"
        }
      ],
      "important_visual_details": "robot arm places bottle"
    },
    {
      "time": 8.0,
      "frame_description": "robot arm retracts to the top right",
      "subject_poses": {
        "robot_arm": "retracting"
      },
      "object_poses": {
        "blue_bin": "center, two yellow-capped bottles inside",
        "white_capped_bottle": "left side",
        "yellow_capped_bottle_1": "inside blue bin",
        "yellow_capped_bottle_2": "inside blue bin",
        "green_capped_bottle": "right side"
      },
      "contacts": [],
      "important_visual_details": "robot arm leaves frame"
    }
  ],
  "contact_events": [
    {
      "time": 2.0,
      "subject_id": "robot_arm",
      "object_id": "yellow_capped_bottle_2",
      "contact_type": "grasping",
      "contact_location": "right side",
      "grasp_or_support_quality": "secure",
      "release_or_transfer": "grasp",
      "visible_evidence": "robot arm closes gripper around bottle"
    },
    {
      "time": 6.0,
      "subject_id": "robot_arm",
      "object_id": "yellow_capped_bottle_2",
      "contact_type": "releasing",
      "contact_location": "inside blue bin",
      "grasp_or_support_quality": "secure",
      "release_or_transfer": "release",
      "visible_evidence": "robot arm opens gripper, bottle placed in bin"
    }
  ],
  "physics_and_affordances": {
    "support_surfaces": "white marble-patterned table",
    "gravity_and_stability": "gravity acts downwards, objects remain stable on table",
    "force_or_constraint_reasoning": "robot arm applies force to grasp and lift bottle, gravity pulls bottle down when released",
    "collision_or_near_collision": "none",
    "object_permanence": "all objects remain in frame throughout",
    "task_goal": "move yellow-capped bottle from right side to blue bin",
    "failure_modes": "none",
    "next_action_prediction": "none"
  },
  "appearance_fidelity": {
    "colors": "blue bin, white table, black robot arm, white bottles with colored caps",
    "textures": "plastic bottles, smooth table surface",
    "lighting_shadows": "even lighting, soft shadows",
    "robot_morphology": "black, multi-jointed, with a black multi-fingered gripper",
    "object_fine_details": "labels on bottles, white handles on blue bin",
    "background_fine_details": "white marble-patterned surface"
  },
  "text_and_labels": [
    {
      "object_id": "white_capped_bottle",
      "raw_text": "illegible",
      "legibility": "illegible",
      "appearance": "text on label",
      "spatial_location": "left side",
      "time_visible": "0:00-0:09"
    },
    {
      "object_id": "yellow_capped_bottle_1",
      "raw_text": "illegible",
      "legibility": "illegible",
      "appearance": "text on label",
      "spatial_location": "inside blue bin",
      "time_visible": "0:00-0:09"
    },
    {
      "object_id": "yellow_capped_bottle_2",
      "raw_text": "illegible",
      "legibility": "illegible",
      "appearance": "text on label",
      "spatial_location": "right side, then inside blue bin",
      "time_visible": "0:00-0:09"
    },
    {
      "object_id": "green_capped_bottle",
      "raw_text": "illegible",
      "legibility": "illegible",
      "appearance": "text on label",
      "spatial_location": "right side",
      "time_visible": "0:00-0:09"
    }
  ],
  "audio_description": null,
  "generation_controls": {
    "shot_type": "medium",
    "camera_motion_instruction": "static",
    "motion_style": "smooth",
    "duration_instruction": "9.767s",
    "spatial_consistency_requirements": "consistent",
    "negative_constraints": "none",
    "recommended_caption_style": "descriptive"
  },
  "uncertainties": [],
  "comprehensive_reproduction_caption": "A top-down view shows a white marble-patterned table with a blue plastic bin centered, containing one yellow-capped bottle. A white-capped bottle is on the left, and a green-capped bottle and a yellow-capped bottle are on the right. A black robot arm enters from the top right, moves to the right, grasps the yellow-capped bottle, lifts it, moves it to the center, places it into the blue bin, and then retracts to the top right. The final state shows the blue bin containing two yellow-capped bottles, with the white-capped bottle on the left and the green-capped bottle on the right."
}