Optimisation de requête update

**fafa63** · 02/02/2016, 17h48

Bonjour,

Je réalise de nombreuses requêtes de mise en jour sur une base postgres. Les tables mises à jour possèdent plusieurs millions de lignes (entre 3 et 4 millions). Le temps de mise à jour me semble particulièrement long... Mais, est ce que je fais bien les choses ???

Pour exemple, voici une requête type :

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
WITH ident_surf_bati_uf AS ( 
 SELECT edi_parc.id_uf,
              SUM(edi_parc.surf_bati_dur) AS surf_bati_dur,
              SUM(edi_parc.surf_bati_leger) AS surf_bati_leger,
              SUM(edi_parc.surf_bati_total) AS surf_bati_tot,
              SUM(edi_parc.surf_resultante_tt_bati) AS surf_resultante_tt_bati
   FROM cad_2015.edi_parc
  GROUP BY edi_parc.id_uf)
 
 
UPDATE cad_2015.edi_parc_uf
  SET surf_bati_leger = ident_surf_bati_uf.surf_bati_leger,
      surf_bati_dur = ident_surf_bati_uf.surf_bati_dur,
      surf_bati_tot = ident_surf_bati_uf.surf_bati_tot,
      surf_resultante_tt_bati = ident_surf_bati_uf.surf_resultante_tt_bati
FROM 
 ident_surf_bati_uf
WHERE 
  ident_surf_bati_uf.id_uf = edi_parc_uf.id_centr
;

Un explain me renvoi ce résultat :

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
 
"Update on edi_parc_uf  (cost=4198990.84..6334634.17 rows=850593 width=1510)"
"  CTE ident_surf_bati_uf"
"    ->  GroupAggregate  (cost=1073593.37..1130730.27 rows=850593 width=76)"
"          ->  Sort  (cost=1073593.37..1081698.53 rows=3242065 width=76)"
"                Sort Key: edi_parc.id_uf"
"                ->  Seq Scan on edi_parc  (cost=0.00..290808.65 rows=3242065 width=76)"
"  ->  Hash Join  (cost=3068260.56..5203903.90 rows=850593 width=1510)"
"        Hash Cond: (ident_surf_bati_uf.id_uf = edi_parc_uf.id_centr)"
"        ->  CTE Scan on ident_surf_bati_uf  (cost=0.00..17011.86 rows=850593 width=576)"
"        ->  Hash  (cost=940953.14..940953.14 rows=13298914 width=1178)"
"              ->  Seq Scan on edi_parc_uf  (cost=0.00..940953.14 rows=13298914 width=1178)"

Suis-je fautif de mauvaises requêtes ou bien ces résultats sont-ils "acceptables" ?

Merci d'avance pour vos conseils.

**Sebwar** · 03/02/2016, 10h18

Hello !

Il n'y aurait pas moyen de segmenter ça par période par exemple pour éviter de faire des SUM sur toute la table ?

sinon tu peux essayer ça, voir ce que ça donne :

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
UPDATE cad_2015.edi_parc_uf
  SET
	surf_bati_leger = (
		SELECT SUM(edi_parc.surf_bati_leger)
		FROM cad_2015.edi_parc
		WHERE edi_parc.id_uf = edi_parc_uf.id_centr),
	surf_bati_dur = (
		SELECT SUM(edi_parc.surf_bati_dur)
		FROM cad_2015.edi_parc
		WHERE edi_parc.id_uf = edi_parc_uf.id_centr),
	surf_bati_tot = (
		SELECT SUM(edi_parc.surf_bati_total)
		FROM cad_2015.edi_parc
		WHERE edi_parc.id_uf = edi_parc_uf.id_centr),
	surf_resultante_tt_bati = 
		(SELECT SUM(edi_parc.surf_resultante_tt_bati)
		FROM cad_2015.edi_parc
		WHERE edi_parc.id_uf = edi_parc_uf.id_centr)
;

**escartefigue** · 03/02/2016, 10h52

Bonjour

Quels sont les index de vos tables et avez vous vérifié que vos colonnes de jointure sont de même format ?

**Sebwar** · 03/02/2016, 11h11

Le mieux serait de nous donner la structure des tables

**fafa63** · 03/02/2016, 16h26

Bonjour à tous et merci pour vos différents retours.

Je réponds à tout le monde d'un coup...

Sebwar :

J'ai appliqué ta proposition de requête et j'ai fait un explain. (A cet instant je ne l'ai pas lancée car ma requête d'origine s'étant terminée toutes mes données sont à jour... après de très longues heures de calcul). En voici le résultat.

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
"Update on edi_parc_uf  (cost=0.00..2469513859847.10 rows=2065406 width=1185)"
"  ->  Seq Scan on edi_parc_uf  (cost=0.00..2469513859847.10 rows=2065406 width=1185)"
"        SubPlan 1"
"          ->  Aggregate  (cost=298913.82..298913.83 rows=1 width=4)"
"                ->  Seq Scan on edi_parc  (cost=0.00..298913.81 rows=4 width=4)"
"                      Filter: (id_uf = edi_parc_uf.id_centr)"
"        SubPlan 2"
"          ->  Aggregate  (cost=298913.82..298913.83 rows=1 width=4)"
"                ->  Seq Scan on edi_parc edi_parc_1  (cost=0.00..298913.81 rows=4 width=4)"
"                      Filter: (id_uf = edi_parc_uf.id_centr)"
"        SubPlan 3"
"          ->  Aggregate  (cost=298913.82..298913.83 rows=1 width=4)"
"                ->  Seq Scan on edi_parc edi_parc_2  (cost=0.00..298913.81 rows=4 width=4)"
"                      Filter: (id_uf = edi_parc_uf.id_centr)"
"        SubPlan 4"
"          ->  Aggregate  (cost=298913.82..298913.83 rows=1 width=4)"
"                ->  Seq Scan on edi_parc edi_parc_3  (cost=0.00..298913.81 rows=4 width=4)"
"                      Filter: (id_uf = edi_parc_uf.id_centr)"

J'ai aussi relancé un explain sur ma requête avec le serveur au "repos" pour avoir des données comparables.

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
"Update on edi_parc_uf  (cost=1609810.91..2079271.59 rows=850593 width=1517)"
"  CTE ident_surf_bati_uf"
"    ->  GroupAggregate  (cost=1073593.37..1130730.27 rows=850593 width=76)"
"          ->  Sort  (cost=1073593.37..1081698.53 rows=3242065 width=76)"
"                Sort Key: edi_parc.id_uf"
"                ->  Seq Scan on edi_parc  (cost=0.00..290808.65 rows=3242065 width=76)"
"  ->  Hash Join  (cost=479080.64..948541.32 rows=850593 width=1517)"
"        Hash Cond: (ident_surf_bati_uf.id_uf = edi_parc_uf.id_centr)"
"        ->  CTE Scan on ident_surf_bati_uf  (cost=0.00..17011.86 rows=850593 width=576)"
"        ->  Hash  (cost=146679.06..146679.06 rows=2065406 width=1185)"
"              ->  Seq Scan on edi_parc_uf  (cost=0.00..146679.06 rows=2065406 width=1185)"

Les "cost" sont sensiblement différents, mais je ne sais pas les interpréter et je n'ai rien de trouvé de très clair sur le net ?

Sebwar et escartfigue :

Voici la structure des tables.

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- Table: cad_2015.edi_parc_uf
 
-- DROP TABLE cad_2015.edi_parc_uf;
 
CREATE TABLE cad_2015.edi_parc_uf
(
  id_prop character varying(12),
  the_geom geometry,
  the_geom_centr geometry,
  gid serial NOT NULL,
  shape_area numeric,
  shape_area_ecrete numeric,
  id_centr character(60),
  jannatmin integer,
  id_uf_ref character(60),
  jannatmin_lots integer,
  nb_locaux integer,
  nb_logts integer,
  nb_proprios_diff_logts integer,
  nb_proprios_occ integer,
  uf_batie_cad11 integer,
  surf_bati_dur integer,
  surf_bati_leger integer,
  surf_bati_tot integer,
  surf_resultante_tt_bati integer,
  surf_bati_dur_cad11 integer,
  surf_phab integer,
  surf_phab_dep integer,
  surf_pprof integer,
  local_maisapp_hab integer,
  local_mais_hab integer,
  local_app_hab integer,
  logt_zone_hab_equipt integer,
  dat_nais_prop_dest_av_imp integer,
  age_prop_dest_av_imp_2015 integer,
  type_prop character varying(60),
  nom_prop_desti_av_imp character varying(60),
  code_insee character(5),
  code_epci character varying(9),
  pgc character varying(1),
  id_zone_plu_12_2014 character(3),
  CONSTRAINT edi_parc_uf_pkey PRIMARY KEY (gid)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE cad_2015.edi_parc_uf
  OWNER TO postgres;
 
-- Index: cad_2015.edi_parc_uf_id_centr
 
-- DROP INDEX cad_2015.edi_parc_uf_id_centr;
 
CREATE INDEX edi_parc_uf_id_centr
  ON cad_2015.edi_parc_uf
  USING btree
  (id_centr COLLATE pg_catalog."default");
 
-- Index: cad_2015.edi_parc_uf_the_geom_gist
 
-- DROP INDEX cad_2015.edi_parc_uf_the_geom_gist;
 
CREATE INDEX edi_parc_uf_the_geom_gist
  ON cad_2015.edi_parc_uf
  USING gist
  (the_geom);

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
-- Table: cad_2015.edi_parc
 
-- DROP TABLE cad_2015.edi_parc;
 
CREATE TABLE cad_2015.edi_parc
(
  gid integer,
  objectid numeric(10,0),
  codlot character varying(254),
  id_parc character varying(15),
  numero numeric(10,0),
  codcomm character varying(6),
  codsec character varying(12),
  codsubsec character varying(20),
  figplan character varying(254),
  codarpent character varying(254),
  codmodif character varying(254),
  supf numeric,
  shape_leng numeric,
  shape_area numeric,
  geom geometry(MultiPolygon,2154),
  code_insee character(5),
  jannatmin integer,
  idparref character varying(15),
  jannatmin_lots integer,
  nb_locaux integer,
  local_maisapp_hab integer,
  local_mais_hab integer,
  local_app_hab integer,
  local_maisapp_hab_lots integer,
  local_mais_hab_lots integer,
  local_app_hab_lots integer,
  nb_logts integer,
  nb_proprios_diff_logts integer,
  nb_proprios_occ integer,
  local_copros integer,
  dcntsol integer,
  surf_bati_dur integer,
  surf_bati_leger integer,
  surf_bati_total integer,
  surf_resultante_tt_bati integer,
  surf_bati_dur_cad11 integer,
  surf_phab integer,
  surf_phab_dep integer,
  surf_pprof integer,
  type_prop character varying(60),
  nom_prop_desti_av_imp character varying(60),
  the_geom_centr geometry,
  the_geom_centr_batis geometry(Point,2154),
  id_uf character(60),
  pgc character varying(1),
  code_epci character varying(9),
  id_zone_plu_12_2014 character(3)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE cad_2015.edi_parc
  OWNER TO postgres;
 
-- Index: cad_2015.edi_parc_geom_gist
 
-- DROP INDEX cad_2015.edi_parc_geom_gist;
 
CREATE INDEX edi_parc_geom_gist
  ON cad_2015.edi_parc
  USING gist
  (geom);
 
-- Index: cad_2015.edi_parc_id_parc
 
-- DROP INDEX cad_2015.edi_parc_id_parc;
 
CREATE INDEX edi_parc_id_parc
  ON cad_2015.edi_parc
  USING btree
  (id_parc COLLATE pg_catalog."default");
 
-- Index: cad_2015.edi_parc_the_geom_centr_batis_gist
 
-- DROP INDEX cad_2015.edi_parc_the_geom_centr_batis_gist;
 
CREATE INDEX edi_parc_the_geom_centr_batis_gist
  ON cad_2015.edi_parc
  USING gist
  (the_geom_centr_batis);
 
-- Index: cad_2015.edi_parc_the_geom_centr_gist
 
-- DROP INDEX cad_2015.edi_parc_the_geom_centr_gist;
 
CREATE INDEX edi_parc_the_geom_centr_gist
  ON cad_2015.edi_parc
  USING gist
  (the_geom_centr);

La table edi_parc_uf a bien un index sur le champ id_centr, les champs de liaison sont tous deux en caractères.
Par contre le WITH qui est utilisé pour mettre à jour la table edi_parc_uf, lui, n'en a pas... est-il possible d'ajouter un index dans un WITH ???

De plus, j'enchaîne dans une seule requête lancée d'un coup, de nombreuses requêtes du même type. Le tout pouvant mettre plusieurs jours à ce mettre à jour. Est-ce une mauvaise idée ?
Je me demande si, au bout de nombreuses heures de mises à jour sur une même table, il n'y a pas une forte diminution des performances de la base postgres ?

**escartefigue** · 03/02/2016, 16h49

Envoyé par fafa63

La table edi_parc_uf a bien un index sur le champ id_centr, les champs de liaison sont tous deux en caractères.
Par contre le WITH qui est utilisé pour mettre à jour la table edi_parc_uf, lui, n'en a pas... est-il possible d'ajouter un index dans un WITH ???

De plus, j'enchaîne dans une seule requête lancée d'un coup, de nombreuses requêtes du même type. Le tout pouvant mettre plusieurs jours à ce mettre à jour. Est-ce une mauvaise idée ?
Je me demande si, au bout de nombreuses heures de mises à jour sur une même table, il n'y a pas une forte diminution des performances de la base postgres ?

Le "WITH" comme vous dites n'a pas à avoir ou non d'index, ce n'est qu'une partie d'une instruction SQL
L'index manquant doit être ajouté dans le DDL, comme les autres index, il manque en effet un index sur la colonne id_uf de la table edi_parc.
La bonne nouvelle c'est que les 2 tables ont bien des colonnes de même format, char(60) en l'occurrence

Créez cet index puis re-testez, ça devrait aller beaucoup mieux

Optimisation de requête update

Requêtes PostgreSQL

Vue hybride

Discussions similaires

Partager

Partager