Hướng dẫn get cosine similarity python

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    In this article, we calculate the Cosine Similarity between the two non-zero vectors. A vector is a single dimesingle-dimensional signal NumPy array. Cosine similarity is a measure of similarity, often used to measure document similarity in text analysis. We use the below formula to compute the cosine similarity.

    Similarity = [A.B] / [||A||.||B||] 

    where A and B are vectors:

    • A.B is dot product of A and B: It is computed as sum of element-wise product of A and B.
    • ||A|| is L2 norm of A: It is computed as square root of the sum of squares of elements of the vector A.

    Example 1:

    In the example below we compute the cosine similarity between the two vectors [1-d NumPy arrays].  To define a vector here we can also use the Python Lists.

    Python

    import numpy as np

    from numpy.linalg import norm

    A = np.array[[2,1,2,3,2,9]]

    B = np.array[[3,4,2,4,5,5]]

    print["A:", A]

    print["B:", B]

    cosine = np.dot[A,B]/[norm[A]*norm[B]]

    print["Cosine Similarity:", cosine]

    Output:

    Example 2:

    In the below example we compute the cosine similarity between a batch of three vectors [2D NumPy array] and a vector[1-D NumPy array]. 

    Python

    import numpy as np

    from numpy.linalg import norm

    A = np.array[[[2,1,2],[3,2,9], [-1,2,-3]]]

    B = np.array[[3,4,2]]

    print["A:\n", A]

    print["B:\n", B]

    cosine = np.dot[A,B]/[norm[A, axis=1]*norm[B]]

    print["Cosine Similarity:\n", cosine]

    Output:

    Notice that A has three vectors and B is a single vector. In the above output, we get three elements in the cosine similarity array. The first element corresponds to the cosine similarity between the first vector [first row] of A and the second vector [B]. The second element corresponds to the cosine similarity between the second vector [second row ] of A and the second vector [B]. And similarly for the third element.

    Example 3:

    In the below example we compute the cosine similarity between the two 2-d arrays. Here each array has three vectors. Here to compute the dot product using the m of element-wise product.

    Python

    import numpy as np

    from numpy.linalg import norm

    A = np.array[[[1,2,2],

                   [3,2,2],

                   [-2,1,-3]]]

    B = np.array[[[4,2,4],

                   [2,-2,5],

                   [3,4,-4]]]

    print["A:\n", A]

    print["B:\n", B]

    cosine = np.sum[A*B, axis=1]/[norm[A, axis=1]*norm[B, axis=1]]

    print["Cosine Similarity:\n", cosine]

    print["Cosine Similarity:\n", cosine]

    Output:

    The first element of the cosine similarity array is a similarity between the first rows of A and B. Similarly second element is the cosine similarity between the second rows of A and B. Similarly for the third element.


    Chủ Đề