MySQL Query Optimierung: Alle Einträge holen die doppelt vorhanden sind

Wenn man sich alle doppelten Einträge einer Tabelle anzeigen lassen möchte, kann man auf folgende Query zurückgreifen:

SELECT * FROM `articles` WHERE `ean` IN (SELECT `ean` FROM `articles` GROUP BY `ean` HAVING count(`ean`) > 1  );

1	SELECT * FROM `articles` WHERE `ean` IN (SELECT `ean` FROM `articles` GROUP BY `ean` HAVING count(`ean`) > 1 );

Leider habe ich bei großen Tabellen bemerkt, das diese Query sehr lange dauert.
Mein erster Gedanke war natürlich nachzuprüfen ob ein Index auf die Spalte ean gesetzt ist.
Nachdem der Index gesetzt wurde hat sich aber an der Geschwindigkei der Abfrage nichts getan. Aber warum nur? Denn theoretisch müsste diese Abfrage ja diesen Index 2x nutzen. Einmal bei der äüßeren und einmal bei der inneren Abfrage.

Dies kann man mit EXPLAIN schnell nachprüfen:

id    select_type         table      type    possible_keys    key    key_len	ref       rows    Extra
1     PRIMARY             articles   ALL     Null             Null   Null       Null    357039    Using where
2     DEPENDENT SUBQUERY  articles   index   Null             ean    63         Null         1    Using index

id select_type table type possible_keys key key_len ref rows Extra

1 PRIMARY articles ALL Null Null Null Null 357039 Using where

2 DEPENDENT SUBQUERY articles index Null ean 63 Null 1 Using index

Sofort sieht man, das der Index bei der äußeren Abfrage nicht von benutzt wird.
Das ist seltsam, weil wenn man die Abfragen einzeln ausführt, der Index jeweils benutzt wird.

Innere Abfrage:

EXPLAIN SELECT `ean` FROM `articles` GROUP BY `ean` HAVING count(`ean`) > 1;

1	EXPLAIN SELECT `ean` FROM `articles` GROUP BY `ean` HAVING count(`ean`) > 1;

id    select_type     table      type     possible_keys    key    key_len    ref     rows      Extra
1     SIMPLE          articles   index    Null             ean    63         Null    357039    Using index

1 2	id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE articles index Null ean 63 Null 357039 Using index

Äußere Abfrage

EXPLAIN SELECT * FROM `articles` WHERE `ean` IN (ERGEBNIS AUS DER VORHERIGEN ABFRAGE);

1	EXPLAIN SELECT * FROM `articles` WHERE `ean` IN (ERGEBNIS AUS DER VORHERIGEN ABFRAGE);

id    select_type     table       type     possible_keys    key    key_len    ref    rows    Extra
1     SIMPLE	      articles    range    ean              ean    63         Null    101    Using where

1 2	id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE articles range ean ean 63 Null 101 Using where

Warum ist das so? Hat jemand eine Idee?

Lösung

Die Query muss anders aufgebaut werden.

SELECT 
    a.* 
FROM `article` AS a 
WHERE (SELECT count(b.ean) FROM `article` AS b WHERE a.ean=b.article_ean GROUP BY b.ean ) > 1
ORDER BY a.ean ASC

SELECT

a.*

FROM `article` AS a

WHERE (SELECT count(b.ean) FROM `article` AS b WHERE a.ean=b.article_ean GROUP BY b.ean ) > 1

ORDER BY a.ean ASC

Thomas Eses

WEBDEVELOPER – PHP, JavaScript, Datenbanken…

Kategorie Tags:

MySQL Query Optimierung: Alle Einträge holen die doppelt vorhanden sind

Lösung

Folgende Artikel könnten auch interessieren

Schreibe einen Kommentar Antworten abbrechen