인공지능(AI) 및 생성형 인공지능/AI 이슈 및 동향

1월 29일 AI 주요 논문

최술사 2026. 1. 29. 15:02

1. Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation
   - MathForge는 수학적 추론을 향상시키기 위해 난이도 인식을 기반으로 한 정책 최적화와 다중 측면의 질문 재구성을 결합한 이중 프레임워크를 제공합니다. 기존의 강화 학습 방법에서의 한계를 해결합니다. [자세히 보기](https://arxiv.org/abs/2601.20614)
   ![MathForge](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.20614.png)

2. Advancing Open-source World Models
   - LingBot-World는 다양한 환경에서 고충실도 역학, 장기 기억 능력, 실시간 상호작용을 제공하는 오픈소스 세계 시뮬레이터입니다. [자세히 보기](https://arxiv.org/abs/2601.20540)
   ![LingBot-World](https://avatars.df528e9008972c8e5ae4d278e617476c.svg)

3. Spark: Strategic Policy-Aware Exploration via Dynamic Branching for Long-Horizon Agentic Learning
   - Spark는 중요한 결정 상태에서 분기하여 계산 자원을 전략적으로 할당함으로써 샘플 효율성과 일반화 능력을 개선하는 강화 학습 프레임워크입니다. [자세히 보기](https://arxiv.org/abs/2601.20209)
   ![Spark](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.20209.png)

4. DeepSeek-OCR 2: Visual Causal Flow
   - DeepSeek-OCR 2는 의미론적 내용을 기반으로 시각적 토큰의 순서를 동적으로 재조정하여 인간과 유사한 인과 추론을 가능하게 하는 DeepEncoder V2를 소개합니다. [자세히 보기](https://arxiv.org/abs/2601.20552)
   ![DeepSeek-OCR 2](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.20552.png)

5. Innovator-VL: A Multimodal Large Language Model for Scientific Discovery
   - Innovator-VL은 원칙적인 훈련 설계와 투명한 방법론을 통해 적은 데이터 요구량으로 강력한 과학 지능을 달성할 수 있음을 보여줍니다. [자세히 보기](https://arxiv.org/abs/2601.19325)
   ![Innovator-VL](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.19325.png)

6. Linear representations in language models can change dramatically over a conversation
   - 언어 모델에서의 선형 표현 방향이 대화 중에 동적으로 변화하여 사실 정보를 인코딩하는 방식에 영향을 미치며 해석 가능성과 문맥 적응형 모델 행동에 대한 함의를 갖습니다. [자세히 보기](https://arxiv.org/abs/2601.20834)
   ![Linear representations](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.20834.png)

7. SERA: Soft-Verified Efficient Repository Agents
   - SERA는 감독형 미세 조정을 통해 코딩 에이전트의 비용 효율적인 훈련을 지원하며, 이전 방법보다 훨씬 더 낮은 비용으로 사전 성능을 달성하고 민감한 코드베이스에 대한 특화도 가능합니다. [자세히 보기](https://arxiv.org/abs/2601.20789)
   ![SERA](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.20789.png)

8. GDCNet: Generative Discrepancy Comparison Network for Multimodal Sarcasm Detection
   - 다중 모달의 비꼼 감지 접근법은 생성 모델을 사용하여 안정적인 의미적 앵커를 생성하고 크로스 모달 불일치를 측정하여 정확성과 견고성을 개선합니다. [자세히 보기](https://arxiv.org/abs/2601.20618)
   ![GDCNet](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.20618.png)

9. RIR-Mega-Speech: A Reverberant Speech Corpus with Comprehensive Acoustic Metadata and Reproducible Evaluation
   - 이 대규모 리버버랜트 음성 말뭉치는 세부적인 음향 주석을 제공하여 음성 처리 연구의 표준화된 비교 및 재현성을 지원합니다. [자세히 보기](https://arxiv.org/abs/2601.19949)
   ![RIR-Mega-Speech](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.19949.png)

10. SketchDynamics: Exploring Free-Form Sketches for Dynamic Intent Expression in Animation Generation
    - 자유 형태의 스케치는 애니메이션 생성 작업에서 자동화된 콘텐츠 생성과 인간의 의도를 연결하는 직관적인 동적 의사소통을 가능하게 합니다. [자세히 보기](https://arxiv.org/abs/2601.20622)
    ![SketchDynamics](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.20622.png)

11. OmegaUse: Building a General-Purpose GUI Agent for Autonomous Task Execution
    - OmegaUse는 모바일 및 데스크탑 플랫폼에서 높은 성능을 달성하기 위한 일반 목적의 GUI 에이전트를 구축하였고, 고품질 데이터 구축 및 분리된 훈련 방법을 활용하고 있습니다. [자세히 보기](https://arxiv.org/abs/2601.20380)
    ![OmegaUse](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.20380.png)

12. SE-DiCoW: Self-Enrolled Diarization-Conditioned Whisper
    - SE-DiCoW는 연설자 기여 ASR 성능을 향상시키기 위해 다이아리제이션 출력을 활용하여 고정 조건 설정을 식별하고 교차 주의 레이어에서 상당한 전사 오류율 감소를 달성합니다. [자세히 보기](https://arxiv.org/abs/2601.19194)
    ![SE-DiCoW](https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.19194.png)