Code to retrieve "points" like in the demo?

#1
by deepboothcells - opened

Could you provide code to retrieve the points when new ask Molmo to specifically show something i nan image (like in the demo in conjunction with segment anything)?

The points are returned in plain text image coordinates, normalized to between 0 and 100. So its just a matter of parsing them out and de-normalizing them, we can add more official code to do those but for now you can use this:

def extract_points(molmo_output, image_w, image_h):
    all_points = []
    for match in re.finditer(r'x\d*="\s*([0-9]+(?:\.[0-9]+)?)"\s+y\d*="\s*([0-9]+(?:\.[0-9]+)?)"', molmo_output):
        try:
            point = [float(match.group(i)) for i in range(1, 3)]
        except ValueError:
            pass
        else:
            point = np.array(point)
            if np.max(point) > 100:
                # Treat as an invalid output
                continue
            point /= 100.0
            point = point * np.array([image_w, image_h])
            all_points.append(point)
    return all_points

Sign up or log in to comment