Things do not always work out. I'm working for a way out.
Published Mar 13, 2024
RAF-DB
FERPlus
AffectNet 7 class & 8 class
There are existing works that focus separately on inter-class similarity, intra-class discrepancy and scale sensitivity.
POSTER aims to solve the 3 problems as a whole.
techniques
2 stream
image
contains global features like cheeks, forehead, and tears drop that landmarks don’t involve
landmark
reduce the effect from the image background and focus on the salient region
Motivation for designing a transformer-based cross-fusion block: let the 2 streams guide each other. the design alleviates inter-class similarity and intra-class discrepancy.
While the use of a pyramid is to reduce the effect of scale sensitivity.
preprocesser(i call it)
Facial landmark detector
Cunjian Chen. PyTorch Face Landmark: A fast and accurate facial landmark detector, 2021
Image backbone
Jiankang Deng, Jia Guo, Niannan Xue, and Stefanos Zafeiriou. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4690–4699, 2019
cross-fusion transformer encoder
pyramid
By Conv1d in different kernel sizes and strides
AttentionBlock
for? self.proj = nn.Linear(dim, dim)
x_lm
in the ViT? Is it just because the landmark already contained position information?x_img
.q
from landmarks instead of local ones. Meanwhile do local attention too.o1
, o2,
and o3
, why use different ways to extract q
, k,
and v
? (2*Conv2d, Conv2d, and none) well I guess it’s to make the o1
, o2,
and o3
to the same shape.