Python group numbers into ranges

I referred Unsupervised clustering with unknown number of clusters as suggested by @schwobaseggl and changed the code a bit as per my need. Here is the new code:

import numpy
import scipy.cluster.hierarchy as hcluster

temp_data = [31,68,74,46,47,83,29,11,9,52,1272,1288,1297,1285,1294,1251,1265,1257,1280,1265,1292,1297,1271,1273,1253,1273,1291,1251,1295,1298,1264,1281,1294,1280,1250,1279,1298,1290,1294,1299,1266,1260,1298,1292,1280,1259,1266,1276,1253,1252,1271,1280,1284,1266,1254,1259,1291,1268,1253,1298,1288,1271,1298,1300,1274,1294,1263,1298,1270,1254,1266,1269,1283,1285,1286,1276,1257,1266,1272,1298,1261,1251,1272,1260,1291,1269,1260,1294,1287,1256,1253,1284,1269,1287,1292,1269,1272,1275,1250,1289,56,35,19,80,47,22,92,8,10,24,87,76,60,63,64,0,1295,1268,1280,1281,1277,1300,1278,1273,1250,1296,1266,1269,1282,1281,1272,1260,1292,1272,1253,1255,1299,1269,1268,1294,1250,1299,1292,1254,1281,1289,1259,1290,1271,1280,1272,1300,1258,1290,1289,1300,1299,1261,1300,1276,1290,1299,1280,1267,1283,1282,1269,1260,1285,1252,1250,1263,1297,1300,1292,1266,1260,1263,1292,1296,1289,1297,1251,1261,1250,1294,1278,1284,1291,1281,1269,1261,1257,1267,1265,1288,1291,1257,1296,1251,1260,1272,1294,1285,1269,1283,1297,1287,1253,1292,1299,1295,1286,1288,1283,1290,20,73,81,6,49,88,96,61,49,94,57,16,61,16,17,19,1280,1257,1259,1277,1257,1262,1263,1280,1292,1250,1287,1272,1258,1253,1285,1285,1257,1291,1273,1260,1267,1250,1280,1281,1263,1269,1292,1250,1282,1263,1274,1288,1296,1266,1291,1271,1273,1281,1261,1289,1269,1287,1296,1283,1280,1298,1259,1270,1259,1289,1269,1284,1295,1297,1256,1300,1281,1296,1284,1288,1285,1296,1277,1251,1279,1295,1281,1264,1280,1263,69,8,30,45,89,61,80,45,9,18,19,11,1255,1299,1296,1293,1287,1250,1265,1291,1281,1250,1286,1286,1251,1287,1266,1288,1254,1260,1260,1254,1267,1299,1273,1250,1300,1250,1279,1255,1293,1292,1278,1277,1252,1299,1278,1258,1268,1274,1285,1258,1279,1270,1278,1286,1278,1253,1267,1300,1295,1298,1285,1288,1274,1272,1252,1256,1283,1289,1251,1258,1253,1257,1297,1269,1292,1253,1273,1281,1251,1280,1253,1274,1275,1287,1296,1298,1296,1291,1284,1261,1267,1290,1273,1281,1263,1270,1264,1269,1278,1284,67,8,40,59,97,64,45,72,45,90,94,7,33,58,97,97,1252,1297,1265,1278,1272,1252,1258,1261,1287,1260,1260,1258,1280,1263,1256,1296,1269,1270,1296,1282,696,678,665,700,700,691,689,688,650,663,662,698,655,660,662,684,690,657,653,663,670,691,687,675,694,670,676,659,661,664,664,689,683,675,687,691,676,659,689,657,659,656,654,679,669,687,666,662,691,1260,1276,1252,1295,1257,1277,1281,1257,1295,1269,1265,1290,1266,1269,1286,1254,1260,1265,1290,1294,1286,1279,1254,1256,1276,1285,1282,1251,1282,1261,1253,56,74,85,94,18,83,38,80,8,4,78,43,7,79,68,78,1275,1250,1268,1297,1284,1255,1294,1262,1250,1252,680,693,677,676,670,653,670,661,658,695,665,671,656,686,662,691,675,658,671,650,667,653,652,686,667,682,694,654,689,682,667,658,651,652,692,652,655,651,650,698,655,650,679,672,697,696,696,683,1277,1264,1274,1260,1285,1285,1283,1259,1260,1288,1281,1284,1281,1257,1285,1295,1273,1264,1283,1284,1300,1299,1257,1297,1254,1257,1270,1257,1295,34,5,73,42,27,36,91,85,19,50,34,21,73,38,18,73]

ndata = [[td, td] for td in temp_data]
data = numpy.array[ndata]

# clustering
thresh = [11.0/100.0] * [max[temp_data] - min[temp_data]]  #Threshold 11% of the total range of data

clusters = hcluster.fclusterdata[data, thresh, criterion="distance"]

total_clusters = max[clusters]

clustered_index = []
for i in range[total_clusters]:
    clustered_index.append[[]]

for i in range[len[clusters]]:
    clustered_index[clusters[i] - 1].append[i]

clustered_range = []
for x in clustered_index:
    clustered_index_x = [temp_data[y] for y in x]
    clustered_range.append[[min[clustered_index_x] , max[clustered_index_x]]]

print clustered_range

i have chosen threshold value [thres] as 11% of the total range of data

so the output for this dataset is:

[[0, 97], [1250, 1300], [650, 700]]

Your use of the dictionary seems to be a way to allow the numbers to arrive out-of-order. Rather than sort them, you seem to be trying to use a dictionary [and associated hashing] to maximize efficiency.

This doesn't really work out perfectly, since you wind up doing sequential searches for a given value.

Hashing a low:high range [a dictionary key:value pair] to avoid a search doesn't help much. Only they key gets hashed. It does help in the case where you're extending a range at the low end. But for extending a range at the high end, you have to resort to searches of the dictionary values.

What you're really creating is a collection of "partitions". Each partition is bounded by a low and high value. Rather than a dictionary, you can also use a trivial tuple of [low, high]. To be most Pythonic, the [low,high] pair includes the low, but does not include the high. It's a "half-open interval".

Here's a version using a simple set of tuples, relying on hashes instead of bisection.

A binary search [using the bisect module] may perform well, also. It would slightly simplify adjacent range merging, since the two ranges would actually be adjacent. However, this leads to a cost in restructuring the sequence.

You start with an empty set of partitions, the first number creates a trivial partition of just that number.

Each next number can lead to one of three things.

It extends an existing partition on the low end or high end. The number is adjacent to exactly one range.
It creates a new partition. The number is not adjacent to any range.
It "bridges" two adjacent partitions, combining them into one. The number is adjacent to two ranges.

It's something like this.

def make_ranges[numberlist]:
    ranges=set[]
    for number in numberlist:
        print ranges, number
        adjacent_to = set[ r for r in ranges if r[0]-1 == number or r[1] == number ]
        if len[adjacent_to] == 0:
            # Add a new partition.
            r = [number,number+1]
            assert r[0]


				
					

                 
	Bài Viết Liên Quan
	
	 	
		
		   
		   
		   
		
		
			How to fetch data from database in python and display in html table

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng sha1 encrypt trong PHP

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn sum map python

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng types html trong PHP

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng seaborn facetgrid python

		
	

		
		
		   
		   
		   
		
		
			Xem tuổi mua nhà năm 2023

		
	

		
		
		   
		   
		   
		
		
			Get single record in mysqli php

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn open new window html

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn phpmyadmin symbolic link

		
	

		
		
		   
		   
		   
		
		
			What is the difference between command create database and create schema in mysql?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn wavelet transform python

		
	

		
		
		   
		   
		   
		
		
			How do you modify a json object in python?

		
	

		
		
		   
		   
		   
		
		
			Python replace none in string

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn php sql injection

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn rotate a list python

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng df drop_duplicates python

		
	

		
		
		   
		   
		   
		
		
			Remove html tags from string c#

		
	

		
		
		   
		   
		   
		
		
			Thí sinh hoa hậu thế giới việt nam 2023

		
	

		
		
		   
		   
		   
		
		
			Array of list to array python

		
	

		
		
		   
		   
		   
		
		
			Create object in object javascript

		
	

	
	




Toplist mới

 
	
	 
		#1
		
			Top 4 uống nước chanh sả mật ong có tác dụng gì 2023
			6 tháng trước
		
	



	
	 
		#2
		
			Top 10 bài tập làm văn số 5 lớp 7 de 4 2023
			6 tháng trước
		
	



	
	 
		#3
		
			Top 3 vừa chơi đã có tài khoản vương giả chap 1 2023
			6 tháng trước
		
	



	
	 
		#4
		
			Top 6 anh sẽ on thôi cover phạm nguyên ngọc lyrics 2023
			6 tháng trước
		
	



	
	 
		#5
		
			Top 7 tài liệu quản lý nhà nước và quản lý ngành giáo dục đào tạo 2023
			6 tháng trước
		
	



	
	 
		#6
		
			Top 7 hãy ra khỏi người đó đi hợp âm 2023
			6 tháng trước
		
	



	
	 
		#7
		
			Top 6 giáo án thơ về thăm nhà bác 2023
			6 tháng trước
		
	



	
	 
		#8
		
			Top 8 giáo án ngữ văn 6 cánh diều 2023
			6 tháng trước
		
	



	
	 
		#9
		
			Top 9 tinh bột tham gia phản ứng nào 2023
			6 tháng trước
		
	






		


	Bài mới nhất
	
	 	
		
		   
		   
		   
		
		
			Chi phí xăng dầu cho vào tài khoản nào năm 2024

		
	

		
		
		   
		   
		   
		
		
			Bảng hệ thống tài khoản kế toán mới nhất 2023 năm 2024

		
	

		
		
		   
		   
		   
		
		
			Mũi va mengoc bc2 là mũi gì năm 2024

		
	

		
		
		   
		   
		   
		
		
			Cung bọ cạp còn gọi là cung gì năm 2024

		
	

		
		
		   
		   
		   
		
		
			14 tháng 1 năm 2023 là ngày gì năm 2024

		
	

		
		
		   
		   
		   
		
		
			Xài những sản phẩm gì của sum là tốt nhất năm 2024

		
	

		
		
		   
		   
		   
		
		
			Http www.animesex.pro en search highschool 20dxd-top 1.html năm 2024

		
	

		
		
		   
		   
		   
		
		
			Hiền tài sinh lỗi số trong tam quốc diễn nghĩa năm 2024

		
	

		
		
		   
		   
		   
		
		
			Bị hạt gạo ở móng tay là bị gì năm 2024

		
	

		
		
		   
		   
		   
		
		
			Chứng nhận bổ túc văn hóa tiếng anh là gì năm 2024

		
	

	
	
                 
	Chủ Đề
	
	
	
		  programming
		  Hỏi Đáp
		  Mẹo Hay
		  Toplist
		  Là gì
		  Địa Điểm Hay
		  Học Tốt
		  mẹo hay
		  Công Nghệ
		  Nghĩa của từ
		  Bao nhiêu
		  Khỏe Đẹp
		  đánh giá
		  Top List
		  bao nhieu
		  bao nhiêu
		  hướng dẫn
		  So Sánh
		  Tiếng anh
		  So sánh
		  Bài tập
		  Xây Đựng
		  Sản phẩm tốt
		  Ngôn ngữ
		  Bài Tập
		  Máy tính
		  javascript
		  Ở đâu
		  Hướng dẫn
		  Thế nào
		  Dịch 
		  Tại sao
		  Đại học
		  Món Ngon
		  Facebook
		  Khoa Học