S
S
SwoCoder2017-06-01 01:23:34
PostgreSQL
SwoCoder, 2017-06-01 01:23:34

How to improve query with aggregation condition?

Hello! There is a news table containing the fields source, category, published_on, etc.:

CREATE TABLE news (
  id serial PRIMARY KEY,
  title character varying NOT NULL,
  description text,
  source character varying NOT NULL,
  category character varying NOT NULL,
  published_on date NOT NULL
);

It is necessary to obtain records in such a way that the number of news from categories c1 + c2 from one source is no more than 5 (if there are more than 5 news in category c1, then the last 5 are taken, if less, then the missing number is typed from category c2, but in total it should be no more than 5 news with categories c1 + c2 from one source), you can take any number of news from other categories.
I have a solution, but it is quite a hack:
with t1 as (
  select n.source,(array_agg(n.id order by published_on desc))[1:5] as ids
  from news n
  where n.category = 'c1'
  group by n.source
),
t2 as (
  select n.source,
  t1.ids||(array_agg(n.id order by published_on desc))[1:5-coalesce(cardinality(t1.ids),0)] as ids
  from t1
  left join news n
  on n.source = t1.source
  where n.category = 'c2' AND coalesce(cardinality(t1.ids),0) < 5
  group by n.source,t1.ids
),
t3 as (
  select *
  from t1
  where cardinality(t1.ids) = 5 OR NOT EXISTS (select 1 from news where source = t1.source AND category = 'c2')
  union
  select *
  from t2
)
select n.id,n.title,n.category,n.source,n.published_on
from news n
where n.id in (select unnest(t3.ids) from t3)
OR n.category NOT IN ('c1', 'c2');

Sample dataset:
INSERT INTO news (title, description, source, category, published_on)
VALUES
('n1', 'desc for n1', 's1', 'c1', '2017-03-24'),
('n2', 'desc for n2', 's1', 'c1', '2017-03-29'),
('n3', null, 's1', 'c1', '2017-05-15'),
('n4', null, 's1', 'c2', '2017-04-12'),
('n5', 'desc for n5', 's1', 'c1', '2017-02-14'),
('n6', 'desc for n6', 's1', 'c1', '2017-04-16'),
('n7', null, 's1', 'c1', '2017-04-23'),
('n8', null, 's1', 'c2', '2017-02-02'),
('n9', 'desc for n9', 's2', 'c1', '2017-05-03'),
('n10', 'desc for n10', 's2', 'c1', '2017-05-12'),
('n11', null, 's2', 'c2', '2017-04-14'),
('n12', null, 's2', 'c2', '2017-03-16'),
('n13', null, 's2', 'c2', '2017-05-10'),
('n14', null, 's2', 'c2', '2017-03-10'),
('n15', null, 's2', 'c3', '2017-04-11'),
('n16', null, 's3', 'c3', '2017-04-01'),
('n17', null, 's3', 'c4', '2017-03-26'),
('n18', null, 's4', 'c2', '2017-02-13'),
('n19',null,'s5','c1','2017-03-08');

And the result is the following:
id | title | category | source | published_on
----+-------+----------+--------+--------------
1 | n1 | c1 | s1 | 2017-03-24
2 | n2 | c1 | s1 | 2017-03-29
3 | n3 | c1 | s1 | 2017-05-15
6 | n6 | c1 | s1 | 2017-04-16
7 | n7 | c1 | s1 | 2017-04-23
9 | n9 | c1 | s2 | 2017-05-03
10 | n10 | c1 | s2 | 2017-05-12
11 | n11 | c2 | s2 | 2017-04-14
12 | n12 | c2 | s2 | 2017-03-16
13 | n13 | c2 | s2 | 2017-05-10
15 | n15 | c3 | s2 | 2017-04-11
16 | n16 | c3 | s3 | 2017-04-01
17 | n17 | c4 | s3 | 2017-03-26
18 | n18 | c2 | s4 | 2017-02-13
19 | n19 | c1 | s5 | 2017-03-08
You can look at pastebin'e
Undoubtedly, there should be a more competent solution here, which I would be glad to see. Thank you.

Answer the question

In order to leave comments, you need to log in

1 answer(s)
M
mishania666, 2017-06-01
@SwoCoder

SELECT id, title, description, "source", category, published_on
FROM (
  SELECT *, rank() OVER (PARTITION BY "source" ORDER BY category, published_on DESC)
  FROM news 
  WHERE category IN ('c1', 'c2')
) t
WHERE rank < 6
UNION
SELECT *
FROM news
WHERE category NOT IN ('c1', 'c2')

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question